Implementation of Digital Filter by Using FPGA

Department of Electrical and Computer System
Engineering
Implementation of digital filter by using FPGA
by
Yishu Wang
A thesis submitted for the degree of
Bachelor of Engineering in Computer Systems Engineering

School of Electrical and Computer Engineering
Title: Implementation of digital filter by using FPGA
Author: Yishu Wang
Family name: Wang
Given name: Yishu
Date Supervisor
Dr Yee Hong Leung, Dr Cesar Ortega-Sanchez
Degree Option
Bachelor of Engineering Computer System
Abstract
Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a formidable task. There
is more than one way to implement the digital FIR filter. Based on the design specification, careful choice
of implementation method and tools can save a lot of time and work. MatLab is an excellent tool to design
filters. There are toolboxes available to generate VHDL descriptions of the filters which reduce
dramatically the time required to generate a solution. Time can be spent evaluating different
implementation alternatives. Computation algorithms are required that exploit the FPGA architecture to
make it efficient in terms of speed and/or area.
Indexing Terms
FPGAs, FIR Filter, Distributed Arithmetic
Good Average Poor
Technical Work
Report Presentation
Examiner Co-Examiner
YishuWang
56 Lowan Loop
Karawara
Western Australia, 6152
Friday, 10 June 2005
Professor Syed Islam,

Head of Department,
Department of Electrical and Computer Engineering,
Curtin University of Technology,
Bentley, 6102, Australia.
Dear Professor Islam
In order to partially satisfy the requirements for the Bachelor of Engineering in

Computer Systems Engineering degree, I present the following thesis entitled
“Implementation of digital filters by using FPGAs”. I declare that this thesis is
entirely my own work except where credited otherwise.
Yours sincerely,
Yishu Wang
ACKNOWLEDGEMENTS
I would like to express my appreciation to my project supervisors Dr Yee Hong Leung
Dr Cesar Ortega-Sanchez and Dr Doug Myers for their patience and helpful suggestions
throughout the life of the project. Their knowledge and support throughout this project
has made this work possible. Their knowledge has proven invaluable in progressing
through the work.
I
Synopsis
Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a
formidable task. There is more than one way to implement the digital FIR filter.
Based on the design specification, careful choice of implementation method and
tools can save a lot of time and work. MatLab is an excellent tool to design filters.
There are toolboxes available to generate VHDL descriptions of the filters which
reduce dramatically the time required to generate a solution. Time can be spent
evaluating different implementation alternatives. Proper choice of the
computation algorithms can help the FPGA architecture to make it efficient in
terms of speed and/or area.
II
INDEX PAGE
TABLES OF FIGURES................................................................................................V
1 INTRODUCTION .................................................................................................1
1.1 Why implement the digital filter? ................................................................ 1
1.2 Design and implementation of digital FIR filter .......................................... 3
1.3 Design outcome............................................................................................ 4
1.4 Overview of the report ................................................................................. 4
2 ISSUES IN IMPLEMENTING THE DIGITAL FILTER .....................................6
2.1 Hardware Architecture ................................................................................. 6
2.2 FIR FILTER IMPLEMENT OPTIONS..................................................... 10
2.2.1 HDL coder.......................................................................................... 11
2.2.2 IP Generators...................................................................................... 11
2.2.3 User Defined ...................................................................................... 15
2.3 Solution to the current practice .................................................................. 16
3 DESIGN SPECIFICATIONS ..............................................................................18
3.1 Expected features of new system ............................................................... 18
3.2 Overview of the system.............................................................................. 19
3.3 Design specification of FIR filters ............................................................. 21
3.4 Overview of UART.......................................................................................... 23
3.4 Design specifications of UART ................................................................. 23
3.5 Design model for transmitter ..................................................................... 25
3.5.1 Design for transmitter ........................................................................ 25
3.5.2 Pseudo code for baud rate clock......................................................... 26
3.5.3 Pseudo code for transmitter................................................................ 26
3.6 Design model for receiver.......................................................................... 28
3.6.1 Design and pseudo code for serial to parallel converter .................... 28
3.6.2 Design for finite state machine and pseudo code for finite state
machine 31
3.6.3 Design and pseudo code for 8 bits counter ........................................ 31
3.6.4 Finite state machine for receiver ........................................................ 32
3.6.5 Pseudo code for receiver .................................................................... 32
3.7 Design for 16 times baud rate clock and it’s pseudo code......................... 35
4 Implementation on board .....................................................................................36
4.1 Overview .................................................................................................... 36
4.2 Onboard peripheral..................................................................................... 37
4.2.1 DDR SDRAM, Flash ......................................................................... 37
4.2.2 Clock Sources .................................................................................... 38
4.2.3 10/100/1000 Ethernet PHY, LCD Panel ............................................ 38
4.2.4 USB 2.0 to RS232 Port ...................................................................... 38
4.2.5 RS232, Configuration and Debug Ports............................................. 39
4.3 Configuration setup.................................................................................... 40
4.4 Connection with PC ................................................................................... 41
5 Simulation Results ...............................................................................................42
5.1 Device utilization summary ....................................................................... 42
5.2 Simulation of the behavior model .............................................................. 48
6 CONCLUSION ....................................................................................................50
7 BIBLIOGRAPHY ................................................................................................51
8 APPENDICES .....................................................................................................52
III
Appendix A: Background knowledge about filter design...................................... 52
Appendix B: filter design by using FDATool........................................................ 54
Appendix B: Circuit diagram................................................................................. 63
Appendix C: Code.................................................................................................. 64
IV
TABLES OF FIGURES
Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through

Figure 2-2 “Unrolled” Direct Form FIR Filter Architecture
Figure 2-3 Direct Form FIR Filter Architecture/Symmetric form
Figure 2-4 Direct Form FIR Filter Architecture/Anti-Symmetric form
Figure 2-5 Transposed Direct form FIR Filter Architecture
Figure 2-6 Serial Distributed Arithmetic FIR Filter Architecture
Figure 2-7 True-down DSP Design Flow
Figure 2-8 Bottlenecks in the DSP Design Process
Figure 2-9 Proper implement techniques choices for the different Filter hardware
architecture
Figure 3.1 Overview of the system
Figure 3-2 Magnitude response of the 8 bit-input and 8 bit-output FIR filter
Figure 3-3 Phase response of the 8 bit-input and 8 bit-output FIR filter
Figure 3-4 Function diagram of the UART
Figure 3-5 Finite state machine of the receiver
Figure 4-1 onboard peripheral
Figure 4-2 USB2.0 to RS-232 port
Figure 4-3 Assign Pins for the designed system
Figure 5-1 Simulation behavior of the UART Transmitter
Figure 5-2 Simulation behavior of the FIR filter
Figure 5-3 Simulation behavior of the16 times baud rate clock
Table 8.1The four classes of FIR filter and their associate properties
Table 8.2 Suitability of a given class of filter for the four FIR types
V
VI
1 INTRODUCTION
1.1 Why implement the digital filter?
Digital filters are used extensively in all areas of electronic industry. This is because
Digital filters have the potential to attain much better signal to noise ratios than
analog filters and at each intermediate stage the analog filter adds more noise to the
signal, the digital filter performs noiseless mathematical operations at each
intermediate step in the transform. As the digital filters have emerged as a strong
option for removing noise, shaping spectrum, and minimizing inter-symbol
interference in communication architectures. These filters have become popular
because their precise reproducibility allows design engineers to achieve performance
levels that are difficult to obtain with analog filters.
FIR and IIR filters are the two common filter forms. A drawback of IIR filters is that the
closed-form IIR designs are preliminary limited to low pass, band pass, and high pass filters,
etc. Furthermore, these designs generally disregard the phase response of the filter. For
example, with a relatively simple computational procedure we may obtain excellent
amplitude response characteristics with an elliptic low pass filter while the phase response
will be very nonlinear. (For any details pleaser refer to the chapter 7)
In designing filters and other signal-processing system that pass some portion of the
frequency band undistorted, it is desirable to have approximately constant frequency
response magnitude and zero phases in that band. For casual systems, zero phases are
not attainable, and consequently, some phase distortion must be allowed. As the
effect of linear phase with integer slope is a simple time shift. A nonlinear phase, on
the other hand, can have a major effect on the shape of a signal, even when the
1
frequency-response magnitude is constant. Thus, in many situations it is particularly
desirable to design systems to have exactly or approximately linear phase.
Compare to IIR filers, FIR filters can have precise linear phase. Also, in the case of FIR
filters, closed-form design equations do not exist. While the window Method can be applied
in a straightforward manner, some iteration may be necessary to meet a prescribed
specification. the window method and most algorithmic methods afford the possibility of
approximating more arbitrary frequency response Characteristics with little more difficulty
than is encountered in the design of low pass filters. Also, it appears that the design problem
for FIR filters is much more under control than the IIR design problem because there is an
optimality theorem for FIR filters that is meaningful in a wide range of practical situations.
The magnitude and phase plots provide an estimate of how the filter will perform;
however, to determine the true response, the filter must be simulated in a system
model using either calculated or recorded input data. The creation and analysis of
representative data can be a complex task. Most of the filter algorithms require
multiplication and addition in real-time. The unit carrying out this function is called
MAC (multiply accumulate). Depends on how good the MAC is, the better MAC the
better performance can be obtained. Once a correct filter response has been determined
and a coefficient table has been generated, the second step is to design the hardware
architecture. The hardware designer must choose between area, performance,
quantization, architecture, and response. (For any details pleaser refer to the chapter 7)
2
1.2 Design and implementation of digital FIR filter
MATLAB combines the high-level, mathematical language with an extensive set of
pre-defined functions to assist in the creation and analysis of filter data. Toolbox are
available for designing filter response and generating coefficient tables, each with
varying levels of sophistication. Graphical filter design tools provide selections for
specifying passband, filter order, and design methods, as well as provide plots of the
response of the filter to various standard forms of inputs. FDAtool from The
MathWorks, which can generate a behavioral model and coefficient tables.
Once a correct filter response has been determined and hardware architecture has
been defined, the implementation can be carried out. Three choices of technology exist
for the implementation of filter algorithms. These are: Programmable DSP chips, ASICs
and FPGAs. (For any details pleaser refer to the chapter 7)
At the heart of the filter algorithm is the multiply-accumulate operation;
Programmable DSP chips typically have only one MAC unit that can perform one MAC
in less than a clock cycle. DSP processors or programmable DSP chips are flexible, but
they might not be fast enough. The reason is that the DSP processor is general purpose
and has architecture that constantly requires instructions to be fetched, decoded and
executed. ASICs can have multiple dedicated MACs that perform DSP functions in
parallel. But, they have high cost for low volume production and the inability to make
design modifications after production makes them less attractive.
FPGAs have been praised for their ability to implement filters since the introduction
of DSP savvy architectures . which can be efficiently realized using dedicated DSP
resources on these devices. More than 500 dedicated multiply-accumulate blocks are
now available, making them exceptionally well suited for high-performance, high-
order filtering applications that benefit from a parallel, non-resource shared hardware
3
architecture. In this particular project, FPGA has been chosen as the implementation
tool. (For any details pleaser refer to the appendix B)
To program FPGA, hardware description language is needed.VHDL synthesis offers
an easy way to target a model towards different implementation.
1.3 Design outcome
The objective of the project is to meet the demand for the development of
courseware in the teaching unit of Digital Signal Processing 304 together with
Computer engineering 203 for the multidisciplinary purpose at the department of
computer and electrical engineering. There are several results has been achieved
through the project life which are: examination of design procedures or design cycle
from different approaches. There is more than one way to implement the digital FIR
filter. The purpose of the project is to identify the different approaches compare and
contrast the different methods, so based on the design specification; careful choice of
implementation method can save designer a lot of time and work. To investigate
relationship among filter order, hardware architecture and FPGA resources usage; to
identify the relationship between the performance and filter hardware architecture;
Using MatLab as major design tool to design filters then compare and contrast the
outcome with outcome from other software package.
1.4 Overview of the report
Chapter 2 covers a brief overview of digital filter design and implementation tools
and methods in use and problems in creating digital filter by using the FPGA board.
Chapter 3 describes the achievement of the project and the methodology has been
used.
4
Chapter 4 describes the implementation of the FIR filter on the FPGA board
Chapter 5 describes the simulation results in details.
Chapter 6 overviews what have been achieved in the project and its future direction.
5
2 ISSUES IN IMPLEMENTING THE DIGITAL FILTER
2.1 Hardware Architecture
Deciding on an optimal architecture for a particular application involves tradeoffs
between area, performance, and response. The abundance of dedicated MAC
(Multiply-Accumulate) blocks in the FPGA devices present the option for
implementing either an area efficient “serial” or a high-performance “parallel” FIR
filter or something in between. Serial architectures share a single MAC resource,
making it very area efficient but at the expense of performance, because one clock-
cycle is required for each tap delay register. This architecture is an excellent choice
for area sensitive, low-performance applications. (For any details pleaser refer to the
chapter 7)
Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through
Although commonly implemented on FPGAs, a serial MAC FIR filter does not
exploit the large number of dedicated hardware MAC units available on the newer
devices. High-performance, high-order filtering applications, that are able to exploit
dedicated multiplier or DSP blocks, often turns to FPGAs for a solution. By
replicating the multiply-adder logic once for each tap delay register, the entire
filtering calculation can be performed in a single clock cycle. With more than 500
6
DSP blocks available, even very large order filters can sustain input sampling rates
over 400 MHz. This type of performance far exceeds even the fastest DSP processors
by orders of magnitude.
Below are the block diagram for an “unrolled” implementation of a direct form FIR
filter and it’s symmetric and anti-symmetric form (For any details pleaser refer to the
chapter 7)
X(n) X(n-1) X(n-2) X(n-3)

Register Register Register
a(0) a(1) a(2) a(3)
y(n)=a(0)(x(n)-x(n-3))+a(1)(x(n-1)-x(n-2))
Figure 2-2 “Unrolled” Direct Form FIR Filter Architecture
X(n) X(n-1) X(n-2)

a(1)
a(0)
y(n)=a(0)(x(n)+x(n-3))+a(1)(x(n-1)+x(n-2))
Figure 2-3 Direct Form FIR Filter Architecture/Symmetric form
X(n) X(n-1) X(n-2)

a(1)
a(0)
y(n)=a(0)(x(n)-x(n-3))+a(1)(x(n-1)-x(n-2))
Figure 2-4 Direct Form FIR Filter Architecture/Anti-Symmetric form
7
When targeting an FPGA device with dedicated DSP blocks capable of supporting
cascaded “multiply-add” operations such as the Xilinx Virtex 4, highest performance
is achieved using a “transposed” architecture. Utilizing the same resources as a
“direct” form FIR filter, data samples are applied in parallel to all tap multipliers
through pipeline registers. The products are then applied to a cascaded chain of
registered adders, combining the effect of accumulators and registers. (For any details
pleaser refer to the chapter 7)
X(n)
X(n) X(n) X(n) X(n)
a(3) a(2) a(1) a(0)
y(n)
a(3)X(n) a(3)X(n-1)+a(2)X(n) a(3)X(n-2)+a(2)X(n-1) y(n)=a(3)X(n-2)+a(2)X(n-1)+a(0)X(n)
Figure 2-5 Transposed Direct form FIR Filter Architecture
A fourth FIR filter architecture worth considering for FIR implementation in FPGAs
is called “distributed arithmetic” (DA). A simplified view of a DA FIR is shown in
Figure 2-8. In its most basic form, DA-based computations are bit-serial in nature –
serial distributed arithmetic (SDA) FIR. These computations can be parallelized
using multiple sets of Look-up table to achieve concurrency. This architecture is
referred to as parallel distributed arithmetic (PDA).
8
Figure 2-6 Serial Distributed Arithmetic FIR Filter Architecture
The advantage of the distributed arithmetic approach is its efficiency. The basic
operations required are a sequence of table look-ups, additions, subtractions, and
shifts of the input data sequence. These functions map efficiently into LUT-based
FPGA, such as Xilinx Virtex II, Virtex 4™ and Altera Stratix™ and Stratix II.
Distributed arithmetic architectures are an excellent choice for high-performance
filter applications targeting FPGA devices that do not include dedicated multipliers
or DSP48 blocks or when these resources are not available. A unique characteristic
of the DA architecture is that the sample rate is decoupled from the filter length,
making the DA architecture appealing for high-order filters. The trade off introduced
Here is one of silicon area (FPGA slices) for performance. As the filter length is
increased in a DA FIR filter, more logic resources are consumed, but throughput and
performance are maintained. The table below summarizes the benefits of each archite
cture.
Filter Architecture Pros Cons
Direct .Area efficient when .Performance decreases on

implemented using shared large order filters
multiply-accumulate
operations .Architecture does not
.Optimal architecture for allow for maximum
devices that offer a efficiency of DSP blocks
dedicated synchronous in the Virtex-4
multiply block such as the architecture
Xilinx Virtex-II and
Virtex-II Pro
9
Transposed  igh-performance when
H  ot as area efficient as
N
mapped to DSP blocks the direct form for small
that support multiply-add order filters
and multiply accumulate
operations such as the
Xilinx Virtex-4 and the
Altera Stratix-II
Distributed Arithmetic  ilter sample rate does

F  erformance diminishes
P
not decrease with an with increase in bit-widths
increase in filter order Consumes more area
Offers highest than direct or transposed
performance in FPGAs forms
when dedicated DSP or
synchronous multiplier
blocks are not available
Table 1 Pros and Cons of the different hardware architecture
Although this discussion on filter hardware architectures has been limited to constant
coefficient, single-rate, single-channel FIR filters, the concepts will apply for
programmable coefficient, multi-rate, and multi-channel variations. The adaptive
filters referenced in the overview, however, have distinct architectural characteristics
that depart significantly from FIR filters.
2.2 FIR FILTER IMPLEMENT OPTIONS
There is more than one way to reach the solution, proper choice of the
implementation tools and techniques can save the designer a lot of work and time.
10
2.2.1 HDL coder
HDL Coder is integrated with the graphical user interface and command line of the
Filter Design Toolbox to provide a unified design and implementation environment.
Filter Design and Analysis Tool (FDATool) or the MATLAB command line can be
used to design a filter and generate VHDL or Verilog code.The design specification
input to the Filter Design HDL Coder is quantized filters that can be create in one of
two ways: Design and quantize the filter with the Filter Design Toolbox, Design the
filter with the Signal Processing Toolbox and then quantize it with the Filter Design
Toolbox.
The Filter Design HDL Coder supports several important filter structures, including:
Direct Form Finite Impulse Response (FIR), Symmetric FIR, Anti-symmetric FIR
Transposed FIR, Direct Form I SOS, infinite Impulse Response (IIR), Direct Form I
Transposed SOS IIR, Direct Form II SOS IIR, and Direct Form II Transposed SOS II
Each of these IIR and FIR structures supports fixed-point and floating-point (double
precision) realizations. In addition, the FIR structures also support unsigned fixed-
point coefficients.
2.2.2 IP Generators
When implementing the actual filter on an FPGA, the designer is presented with a
fundamental choice of whether to use an IP core or design a custom implementation.
Both options have merits and limitations. FIR filter IP cores are readily available
from multiple sources with the most common forms being technology-specific enlists,
synthesizable RTL, and synthesizable MATLAB. All these generators can construct
a filter using a pre-defined coefficient table.
11
Xilinx and Altera both offer high-performance but device-specific filter IP in the
form of technology mapped enlists. This form of IP is typically the most cost
effective and offers the best results but provides the fewest options. Often each core
is hand-crafted to leverage device specific hardware resources, such as Xilinx Block
RAM, and includes placement information which helps maintain performance as
filter size grows. The format of the IP, however, precludes user modification of the
source or retargeting to a different device. The MathWorks offers device independent
filter IP; however, they don’t provide the ability to exploit the high-performing
resources of the FPGA. This form of IP tends to be a bit more expensive but offers
the user the additional options of retarget ability and modification of the RTL source.
The quality of the results will vary with the quality of the RTL model and the RTL
synthesis tool used for implementation. In general, however, some area and
performance is sacrificed to obtain general purpose, retarget able RTL models.
AccelChip offers both high-performance and device-independent IP available in
AccelWare® IP Toolkits. AccelWare IP cores are provided as synthesizable
MATLAB models that offer several advantages. First, the abstraction level of the IP
allows for the greatest range of options. Describing a filter in MATLAB is much
easier than creating a technology-specific placed netlist or RTL. This allows
AccelChip to build a wider range of hardware architecture options into the model,
providing the user the greatest number of choices. Secondly, the architecture of the
final hardware can be further modified during the DSP synthesis process with
AccelChip through the use of “directives.” This provides an efficient means of
leveraging the architectural features of the specific FPGA device, such as dedicated
DSP blocks, RAMs, ROMs, and shift registers, to achieve high quality of results in
12
the target device. The table below summarizes the synthesis directives available in
AccelChip. (For any details pleaser refer to the chapter 7)
AccelChip Synthesis Effect on Results

Directive
Rolling/unrolling of For loops Improves sustainable data throughput

rate by reducing the number of cycles for
processing per input sample
Expansion of vector and matrix Improves sustainable data throughput by

additions and multiplications reducing the number of cycles for
processing per input sample
RAM/ROM memory mapping of Improves FPGA utilization by mapping

1D and 2D arrays 1D and 2D arrays into dedicated FPGA
RAM resources
Pipeline insertion Improves sustainable data throughput

rate by improving clock frequency
performance
Shift register mapping Improves FPGA utilization by mapping

data buffers to shift register logic
IP generators provide an excellent choice when time-to-market or ease-of-use
demands prevail. The general purpose nature of these programmable IP generators,
however, seldom yields the optimal hardware for a specific application. For this, a
user-defined architecture should be considered.
13
Figure 2-7 True-down DSP Design Flow
14
2.2.3 User Defined
FIR filter implementations with aggressive area or performance targets may require
the use of a hand-crafted implementation over a pre-defined IP block. Doing so
allows the designer to exploit application-specific characteristics, such as unique
impulse response, coefficient symmetry, ones in the coefficient table, negative
coefficients, coefficient length, or knowledge about the dynamic range of the input
data to the filter. Most filter IP generators include some simple coefficient symmetry
optimization but do not account for every possible situation.
User-defined implementations have traditionally been achieved using hand-coded
RTL. This approach offers the designer a high degree of control over the
implementation process but is decoupled from the MATLAB simulation environment
and can be time consuming. A typical 32 tap FIR filter Verilog model can range
between 300 – 500 lines of hand-created VHDL or Verilog code and will often
require the instantiation of FPGA-specific RAMs or DSP blocks to achieve optimal
results.
The quantization must be determined for each signal and register in the design,
which typically requires a re-write of the original floating-point model in language
that supports fixed-point modeling and filter response analysis. Once complete, the
final RTL code must be verified back to the original filter model to insure functional
correctness. A recent survey conducted by AccelChip found that designers were
evenly split between these major DSP design tasks when asked to identify their most
significant bottlenecks.
15
Figure 2-8 Bottlenecks in the DSP Design Process
2.3 Solution to the current practice
Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a
formidable task. There is more than one way to implement the digital FIR filter.
Based on the design specification, careful choice of implementation method and
tools can save a lot of time and work. MatLab is an excellent tool to design filters.
There are toolboxes available to generate VHDL descriptions of the filters which
reduce dramatically the time required to generate a solution. Time can be spent
evaluating different implementation alternatives. Proper choice o f computation
algorithms can improve the FPGA architecture to make it efficient in terms of
speed and/or area.
16
FIR filter Design Methode
Design Tool (MATLAB, MATCAD....) Implemenation Tool(Xilinx, Altera...)
FDATool Look-up Hardware Description C Language

Table Language
FVTool Accelware
Generations
Multi-Rate FIR filter
(Idea form) Distribute Sym FIR using (SDA)
Direct Arithmetic Non-Sym FIR using (SDA)
Direct-Transposed Multi-Rate FIR filter Sym FIR using (PDA)
Symmetric (User defined input data) Non-Sym FIR using (PDA)
Anti-Symmetric
Figure 2-9 Proper implement techniques choices for the different Filter hardware architecture
Iinvestigation through the first phase of the project has found that, in the current
market there is so many design, implementation tools, each of them has the
advantage and disadvantage in some aspects .Conclusion can be draw from the
investigation is that to implement the linear phase filter MATLAB or MATCAD
is the best choice as there are several toolbox available which remove the
complexity for the design and implementation. To design and implement the
Multi-rate FIR filter, Filter visualisation tool from MATLAB and Accelware
generations software package is the best choice. For the distribute arithmetic
hardware architecture the best way to implement it is to use the look-up table
technique also distribute arithmetic algorithm is the best choice for implementing
FIR filters with Xilinx FPGAs. One of these implementations, namely, Serial
Distributed Arithmetic (SDA) exhibits efficiency in terms of area. The other
implementation, named Parallel Distributed Arithmetic (PDA) shows a very high
sample rate. Linear-phase response FIR filters were also developed and
implemented in an FPGA using the DA technique.
17
3 DESIGN SPECIFICATIONS
3.1 Expected features of new system
The objective of this system is to provide a hardware platform to test FIR filters that
have been generated using MATLab. The system performs the following functions:
1. Communicate with a PC using a standard RS-232 serial interface.
2. Receive a data file from the PC. Data will be received in binary form so they
can be applied to the FIR filter without further transformation. For this
exercise consider that the FIR filter receives 8-bit samples.
3. The outputs of the FIR filter must be returned to the PC where they will be
stored in a file. Again assume data from the FIR filter have to be sent the
outputs in binary form (without transformation).
4. To send and receive the files from/to the PC a program like “HyperTerminal”
can be used. The serial port must be configured with the following settings:
Bauds rate is 57600 which is 54 times slower than the 50MHz crystal clock
onboard;1 start bit, 1 stop bit ,8 data bits, No parity ,No flow control.
18
3.2 Overview of the system
8
DRdy
PC RS-232
interf UART FIR filter
ace 8
WrD EOF
50 MHz
Clock for onboard
Clock
baud rate oscillato
divider
r
FPGA development board
Figure 3.1 Overview of the system
In above figure, there are two blocks that do not have to be designed: the RS-232
interface, it is provided as part of the development board, and the FIR filter, it is
implemented using MATLab.
The blocks that have to be developed are the Universal Asynchronous Receiver-
Transmitter (UART), the clock divider. A brief description of each one of these
blocks provided next.
UART must receive and transmit data in a serial format. Every frame that is
transmitted contains 1 start bit, 8 bits of information and 1 stop bit. The function of
the UART can be divided two: the transmission and the reception circuits. In this
particular application, the FIR filter receives 8-bit numbers at its input. Hence, it is
possible to feed the filter directly from the UART.
The receiver must perform the following functions:
• Detect the start of a frame by sensing the start bit.
19
• Sample the 8 bits of information at the frequency corresponding to the baud
rate, least significative bit first.
• Present the 8 bits of information as one byte to the FIR filter.
• Generate a signal DRdy to latch the new data into the filter. This signal must
work as a “write” signal to the filter.
The transmitter must perform the following functions:
• Receive 8-bit data from the FIR filter. Data will be latched in the transmitter
using the WrD signal generated by the data manager.
• Transmit the data in serial format at the baud rate, least significant bit first. 1
start bit at the beginning and 1 stop bit at the end of the frame must be
inserted
Clock divider in this circuit must divide the 50MHz clock available on the board to
deliver a frequency related to the baud rate. This new clock has to be used to receive
and send data over the serial channel at the correct speed.
FIR filter with its function is to receive 8 bit data from the UART and apply the
filtering process for the incoming data. This circuit must provide an interface
compatible with the UART. In figure above, this interface is represented by the
signal EOF (End Of Filtering) that indicates when the filter has a new data available
on its output.
20
3.3 Design specification of FIR filters
Low-pass FIR filter has been implemented with

• Response type: Low pass;
• Design method: Equi ripple;
• Density factor: 20;
• Filter order: order 5;
• Hardware architecture: Direct form;
• Sampling frequency: 48000Hz;
• Pass band frequency: 9600Hz;
• Stop band frequency: 12000Hz;
• Input data length: 8 bits;
• Output data length: 8 bits;
Figure 3-2 Magnitude response of the 8 bit-input and 8 bit-output FIR filter
21
Figure 3-3 Phase response of the 8 bit-input and 8 bit-output FIR filter
22
3.4 Overview of UART
Figure 3-4 Function diagram of the UART
The UART is for interfacing computers or microprocessors to an asynchronous serial
data channel. The receiver converts serial start, data, parity and stop bits. The
transmitter converts parallel data into serial form and automatically adds start, parity
and stop bits. The data word length can be 5, 6, 7 or 8 bits. Parity may be odd or even.
Parity checking and generation can be inhibited. The stop bits may be one or two or
one and one-half when transmitting 5-bit code. It can be used in a wide range of
applications including modems, printers, peripherals and remote data acquisition
systems. (For any details pleaser refer to the chapter 7)
3.4 Design specifications of UART
In the UART (Universal Asynchronous Receive and Transmission) all data
transmit or received in unit of bit, at the beginning and the end of the data frame
there are the start bit and the stop bit respectively.
23
Figure 3-4 Flow chart representation of the system
Start bit is logic 0, it always stays in the front of the data frame in order to remain
the receiver to receive the data, in the process of the receiving the start bit will be
removed. According to the TCP/IP, it allows to have the data frame within the
range of the 5 to 8. Usually, the data frame is7 or 8 bits, if it required to
transmitting ASCII data (Assume all data in binary number), in this design the
data frame is in the length of the 8 bits. When the data was transmitted start from
the LSB and the MSB is at the end of the data frame. Such as the letter C in the
ASCII code it can be represented in 67 in decimal number and 01000011 in the
binary number, when it started to transmitted it will become 11000010. There is
something wrong with the data frame. Stop bit is logic 1, it always stays at the
end of the data frame, and it can be 1 bit, 1.5bit or 2 bits. In the project, it is 1 bit.
And there is no parity bit and framing error bit. Baud Rate is the number of the
bits can be received and transmitted within a second. If the time need for
transmitting and receiving one bit is t then the baud rate will be 1 over t, the
corresponding transmit and receive clock is 1/t Hz. The baud rate for the
transmitter and receiver should be consistent otherwise the error can be occurred,
24
in this particular design the baud rate has been assumed as 54 times slower than
the clock signal on the board.
3.5 Design model for transmitter
To simplify the design for the project, there is a start bit, 8 bits data in length and a
stop bit and there is no parity bit, assume the baud rate is 54 times slower than the
clock signal from the board.
3.5.1 Design for transmitter
According to design specification, it needs to send 10bits data (1 start bit, 8 bits data,
1 stop bit), after the stop bits, the transmitter will stop and the logic level will stay
back to logic1, then it will stay in idle state to wait for the next transmission. The
transmitter section accepts parallel data, formats the data and transmits the data in
serial form on the serial data output (sdo) terminal. Data is loaded from the inputs din
7 –din0 into the transmitter buffer register by applying logic high on the write input
port. Valid data must be present at least tset prior to and thold following the rising
edge of write signal. If words less than 8 bits are used, only the least significant bits
are transmitted. The character is right justified, so the least significant bit
corresponds to txbr(0). The rising edge of write signal clears transmitter buffer
register. 0 to 1 Clock cycles later, data is transferred to the transmitter register, the
tx_rd pin goes to a low state, tx_rd is set high and serial data information is
transmitted. The output data is clocked at a clock rate 16 times the data rate. A
second high level pulse on write loads data into the transmitter buffer register. Data
transfer to the transmitter register is delayed until transmission of the current data is
complete. Data is automatically transferred to the transmitter register and
25
transmission of that character begins one clock cycle later.
3.5.2 Pseudo code for baud rate clock
If the logic level 1 at the input port reset has been detected then
The clk signal is logic 0
Variable count has been initialized to 000
Elsif there is a rising edge of the 16 times faster baud rate clock then
If the variable count is in the initial state 000 then
Invert the clk signal
Variable count increment by 1
3.5.3 Pseudo code for transmitter
The clk signal is logic 0
Variable count has been initialized to 000
Elsif there is a rising edge of the 16 times faster baud rate clock then
If the variable count is in the initial state 000 then
Invert the clk signal
Variable count increment by 1
Signal of the 8 bits serial register is 00000000
Signal of the empty shift register is logic 0
Signal of tag bits for detecting of the start bit is logic 0
Output of the transmitter is logic 0
Elsif there is rising edge of the baud rate clock then
26
If shifting of the byte is done and data is ready in txhold then
Load transmitter register from 8 bits buffer
Tag bits for detecting is on
Empty shift register is on
Send logic 0 start bit into the output port sdo
Else
Shift data
If one byte of data has been shift out then
Shift out the stop bit or idle state
Else
Shift the least significant bit of the serial register txsr(0)
Send the logic 0 into the signal txrd
Variable write signal delayed 1 wr1 is logic 0
Variable write signal delayed 2 wr2 is logic 0
Variable txdone1 is logic 0
Elsif there is a rising edge of the16 times faster baud rate clock then
If there is falling edge on write singal
Signal txrd is in the high state
Elsif there is a falling edge on txdone signal( Txbr has been read)
Send the logic 0 into the signal txrd
Variable wr2 an variable wr1 is equal
Send the write strobe signal from input port wr to the varible wr2
Send the logic level of the txdone to the variable txdone1
Latch data bus during write
27
Transmitter ready for write when no data is in the transmitter buffer register
3.6 Design model for receiver
Data is received in serial form at the input port Din. When no data is being received,
Din must remain high. The data is clocked through the Clock. The clock rate is 16
times the data rate.
The UART receiver is implemented as a structural model. Some parts of it are:
1) Finite state machine
2) 8 bits counter
3) Serial to parallel converter
3.6.1 Design and pseudo code for serial to parallel converter
The serial to parallel converter is controlled by finite state machine which has only
two states. State S0 is idle and S1 is data tapping state. Output of FSM is control
signal for serial to parallel converter which uses a counter which counts till certain
states to keep track of data and simultaneously tapping data in an 10 bit logic vector.
After tapping of tenth bit the counter signals to finite state machine back to state S1.
When the current state is state S0
If the start bit of the incoming data has been detected then
Go the next state S1
The datardy signal is in the lower state 0
If the start bit of the incoming data has not been detected or idle state
The next state is state S1
28
When the current state is in the state S1 then
If the counter is in the state hex number 18 then
Sampling the start bit into the tmpdata register in bit tmpdata(0);
The next sate is state S1
ElsIf the counter is in the state hex number 28 then
Sampling the incoming data Din into the tmpdata register in bit tmpdata(1);
29
Output the data from the buffer register into the output port
The datardy signal is in the high state 1
The datardy signal is in the high state 1
ElsIf the counter is in the state hex number 9D then
The datardy signal is in the low state 0
Go back to the state S0
Else the counter is in the other state
30
3.6.2 Design for finite state machine and pseudo code for finite state machine
The finite state machine is triggered from state S0 to state S1 when there is a rising
edge of the clock signal; it stays at state S0 when the reset signal is at the higher level.
If the signal at the input port reset has been scanned as logic 1 then
The current state is state S0
If there is a rising edge of the clock signal then
The current change to the next state S1
3.6.3 Design and pseudo code for 8 bits counter
The 8 bits counter signals at the predefined state; the 8 bit counter uses this signal as
its clock to tap data. It is counting 11 such signals from 8 bit counter.
If the logic level at the reset input port has been scanned as the logic 1 or the current
state is in the state S0
Then counter initialize itself to the “00000000”
Elsif there is the rising edge of the clock signal the
Counter increment by 1
31
3.6.4 Finite state machine for receiver
If there is falling edge of the incoming data Din
Send data Do nothing

S0
S10
S1
Load the the incoming data to the bre(8) S9
S2 Load the the incoming data to the bre(1)
S8
Load the the incoming data to the bre(7) Load the the incoming data to the bre(2)
S3
S7
Load the the incoming data to the bre(6)

S4 Load the the incoming data to the bre(3)
S6
S5
Load the the incoming data to the bre(5) Load the the incoming data to the bre(4)
Figure 3-5 Finite state machine of the receiver
3.6.5 Pseudo code for receiver
If the current state is state S0 and If there is the falling edge of the
incoming data has been detected (start bit 0 has been found) then
Go to the state S1
Else stay in the state S0 wait until the start bit has been detected
When the current the state is in state S1 then
If the state control signal hex number 18 has been detected
Then stay in the state S1
Sampling the incoming data into the tmpdata register tmpdata bit 0
Apply the signal logic 0 to Drdy
32
If the state control signal hex number38 has been detected
33
Output the all sampling data to the port dout
If the state control signals hex number 9D has been detected
Then go back the state S0
Else
When the State is not state S0 or S1 then
34
3.7 Design for 16 times baud rate clock and it’s pseudo code
To generate baud rate of 16 times faster than the baud rate of 57600 frequency of
onboard clock needs to be divided by (16*57600 = 921600) which is 54.2534.
After rounding it becomes integer number 54.
If the signal 1 has been scanned at the input port en
If there is rising edge of the 50MHz clock
Then count incremented by 1
If there is 54 pulses has been passed
Then the output the logic 1
And the variable count has been initialized to 0
Else output logic 0
35
4 Implementation on board
4.1 Overview
The Virtex-4™ FX12 LC Development kit provides a complete development
platform for designing and verifying applications based on the Xilinx Virtex-4 FPGA
family. This kit enables designers to implement DSP and embedded processor based
applications with extreme flexibility using IP cores and customized modules. The
Virtex-4 FPGA along with its integrated PowerPC processor core makes it possible
to prototype processor based applications, enabling software designer early access to
a hardware platform prior to working with the final product/target board. The Virtex-
4 FX12 LC system board utilizes the Xilinx XC4VFX12-10FF668C FPGA. The
board includes 64MB of DDR SDRAM, 4MB of Flash, USB-RS232 Bridge, a
10/100/1000 Ethernet PHY, 100 MHz clock source, RS-232 port, and additional user
support circuitry to develop a complete system. The board also supports the P160
expansion module standard, allowing application specific expansion modules to be
easily added.
36
4.2 Onboard peripheral
Figure 4-1 Onboard peripheral
4.2.1 DDR SDRAM, Flash
The Virtex-4™ FX12 LC development board provides 64MB of DDR SDRAM
memory (x16). A high-level block diagram of the DDR SDRAM interface is shown
below followed by a table describing the SDRAM memory interface signals. The
Virtex-4™ FX12 LC development board provides 4MB of flash memory (x16). A
high-level block diagram of the flash interface is shown below followed by a table
describing the flash memory interface signals.
37
4.2.2 Clock Sources
The Clock Generation section of the Virtex-4 FX12 LC board provides all the
necessary clocks for the PowerPC processor, the I/O devices located on the board, as
well as the DDR SDRAM memory. An on-board 100MHz oscillator provides the
system clock input to the processor section. This 100Mhz clock will be used by the
Virtex-4 Digital Clock Managers (DCMs) to generate various processor clocks. In
addition to the above clock inputs, a socket is provided on the board that can be used
to provide single ended LVTTL clock input to the FPGA via an 8 or 4-pin oscillator.
4.2.3 10/100/1000 Ethernet PHY, LCD Panel
The Virtex-4 FX12 LC development board provides a 10/100/1000 Ethernet port for
network connection. This interface uses the Virtex-4 embedded 10/100/1000 MAC.
A high-level block diagram of the 10/100/1000 Ethernet interface is shown in the
following figure followed by FPGA pin assignments for this interface.
The Virtex-4 FX12 LC development board provides an 8-bit interface to a 2x16 LCD
panel (MYTECH MOC-16216B-B). The following table shows the LCD interface
signals.
4.2.4 USB 2.0 to RS232 Port

The Virtex-4 FX12 LC development board implements a USB 2.0 port. This is
accomplished using the Cygnal CP2101 USB-to-UART Bridge Controller. The
FPGA interfaces to the CP2102 as a simple UART. The UART interface to the
CP2102 can run at speeds ranging from 300 to 921,600 baud. The CP2102 is a highly
integrated USB-to-UART Bridge Controller, providing a simple solution for USB
serial communications using a minimum of components and PCB space. The
CP2102 includes a USB 2.0 full-speed function controller, USB transceiver,
oscillator, EEPROM, and asynchronous serial data bus (UART) with full modem
38
control signals in a compact 5mm X 5mm MLP-28 package. No other external USB
components are required. The on-chip EEPROM may be used to customize the USB
Vendor ID, Product ID, Product Description String, Power Descriptor, Device
Release Number, and Device Serial Number as desired. The EEPROM is
programmed on-board via the USB allowing the programming step to be easily
integrated into the product manufacturing and testing process. Royalty-free Virtual
COM Port (VCP) device drivers provided by Cygnal allow the Virtex-4 FX12 LC
development board to appear as a COM port to PC applications. The CP2102 UART
interface implements all RS232 signals, including control and handshaking signals.
These signals are interfaced to the Virtex-4 FPGA as follows:
Figure 4-2 USB2.0 to RS-232 port
4.2.5 RS232, Configuration and Debug Ports
The Virtex-4 FX12 LC development board provides an RS232 interface with RX and
TX signals and jumpers for connecting the RTS and CTS signals. The following
figure shows the RS232 interface to the Virtex-4 FX12 FPGA.Various methods of
configuration and debug support are provided on the Virtex-4 FX12 development
board to assist designers during the testing and debugging of their applications. The
following sections provide brief descriptions of each of these interfaces.
39
4.3 Configuration setup
The configuration for implementation on the board has been set up as the following.
For the 50MHz clock onboard Pin 184 has been assigned For the USB switch P16
has been assigned. For the reset button Pin 22 has been assigned. For the serial data
input into the receiver Pin11 has been assigned. For the serial data output from the
transmitter Pin 3 has been assigned. (For any details pleaser refer to the chapter 7 for the
reference reading)
Figure 4-3 Assign Pins for the designed system
40
4.4 Connection with PC
Design and implementation needs to be done at the host computer system before it
downloads the configuration file into the FPGA. Connection needs to be made
between the host computer system and target system by using connection cable.
Figure 4-3 Connection between the host system and the target system
A terminal program called hype terminal is an application that will enable a PC to
communicate directly with a modem. This can be useful to test the overall system
design and diagnose problems. For this particular design the baud rate for the
HyperTerminal has been setup as 57600 baud rate. (For any details pleaser refer to the
chapter 7 for the reference reading)
41
5 Simulation Results
5.1 Device utilization summary
As one of the objectives of the project is to investigate the resources usage inside of
FPGA, simulation result shown that the amount of the component has been used in
the FPGA as shown in the following diagram for the linear phase FIR filter for the
order 5, order 10 and order 15.
Direct FIR Order 5

Selected Device : 3s400pq208-4
Number of Slices: 143 out of 3584 3%
Number of Slice Flip Flops: 131 out of 7168 1%
Number of 4 input LUTs: 172 out of 7168 2%
Number of bonded IOBs: 54 out of 141 38%
Number of MULT18X18s: 6 out of 16 37%
Number of GCLKs: 1 out of 8 12%
Transposed FIR Order 5

Symmetric FIR Order 5

Anti-Symmetric FIR Order 5

42
Direct FIR order 10




Direct FIR Order 15

43



From the observation simulation result the conclusions can be drawn is that for the
same filter order, direct hardware architecture takes larger resources than the
transposed architecture. Anti-Symmetric hardware architecture takes larger resources
than the symmetric hardware architecture as symmetric hardware architecture shares
some component in common so the result of the simulation is reasonable.
44
Resources Usage for Direct FIR filter
600
500
400
300
200
Order 5
Order 10
100
Order 15
0
Slice FFs LUTs IOBs MULT GCLKs
Observation from the graph has found that for direct form FIR filter architecture as
the filter order increases the amount of the resources usage is increasing correspond.
Resources Usage for Transposed form FIR filter
600
500
400
300
200 Order 5
100
Order 10
Order 15
0
45
Observation from the graph has found that for transposed form of FIR filter
architecture as the filter order increases the amount of the resources usage is
increasing correspond.
Resources Usage for Symmetric FIR filter
400
350
300
250
200
150
Order 5
100
Order 10
50
Order 15
0
Observation from the graph has found that for Symmetric form FIR filter architecture
as the filter order increases the amount of the resources usage is increasing
correspond.
Resources Usage for Anti-Symmetric FIR filter
600
500
400
300
200 Order 5
100 Order 10
Order 15
0
46
Observation from the graph has found that for Anti-symmetric form FIR filter
architecture as the filter order increases the amount of the resources usage is
increasing correspond.
47
5.2 Simulation of the behavior model
The simulation result for the UART transmitter is correct and the behaviour model
has met the design requirement as the result shown that the most significant bit
received from the parallel port of the 8 bits register now is becoming the least
significant bit and the least significant bit received form the parallel port of the 8 bits
register now is becoming the most significant bit and so on. There is no parity check
bit and frame error bit.
Figure 5-1 Simulation behavior of the UART Transmitter
It is really hard to judge the behaviour of the FIR low pass filter form this simulation
data however the simulation behaviour is generated based on the design specification
in section 3.3 so the behaviour model should be correct.
48
Figure 5-2 Simulation behavior of the FIR filter
The simulation result for the 16 times baud rate clock is correct as it can be seen that
there is only one pulse signal for the clk16x only when 54 50MHz pulses passed so it
is correct.
Figure 5-3 Simulation behavior of the16 times baud rate clock
The simulation result for the receiver is correct.
Figure 5-4 Simulation behavior of the UART receiver
49
6 CONCLUSION
The traditional methodology for designing a filter consists of two phases: system
specification and hardware implementation. Both phases require multiple iterations
and can involve a four-to-six-week process just to ensure that a single functional
block operates to specification within the system. Creating this filter from the ground
up, using either VHDL or other design entry methods, would have required too much
development time. MatLab is an excellent tool to design filters. There are toolboxes
available to generate VHDL descriptions of the filters which reduce dramatically the
time required to generate a solution.
There is more than one way to implement the digital FIR filter. Based on the design
specification, careful choice of implementation method and tools can save a lot of
time and work. Time can be spent evaluating different implementation alternatives.
Proper computation algorithm can improve the FPGA architecture to make it
efficient in terms of speed and/or area.
As the filter order increase the amount of resources usage increase correspondently,
Non- Symmetric FIR filter took larger amount of resources than symmetric FIR filter
as the Amount of hardware component to implement the Non-Symmetric FIR filter is
bigger than the symmetric FIR filter.
About future work for the improvement of the project, digital to analogue converter,
analogue to digital converter and IIR filter can be tried to implement on the FPGA
board.
50
7 BIBLIOGRAPHY
Abdelhak M. Zoubir “Digital Signal Processing and Spectral Analysis”, Manuscript

for DSP 304, pp18, 60, 75, 1999
AccelChip, Inc., “AccelWare IP for DSP Design Targeting FPGAs and ASICs” pp5,
2004
AccelChip, Inc., “Comparison of Methods for Implementing DSP Algorithms ” pp. 5,

July 2004.
AccelChip, Inc., “Filter Design Methods for FPGAs” pp. 1-10, 2004.
AccelChip, Inc., “Automated Conversion of Floating-point to Fixed-point

MATLAB” pp.6 2004.
AccelChip, Inc., “What’s the Right Language for DSP System-level

Design?”pp3,2004
AccelChip, Inc., “MATLAB as a System Modeling Language for Hardware,” pp9,

2004.
AccelChip, Inc., “Top-Down DSP Design Flow to Silicon Implementation,” pp6-8,

2004.
Nirman Kendra, “SI40UART01 Universal Asynchronous Receiver/Transmitter Core

Factsheet” 12/20/2002
Intersil, Inc..“CMOS Universal Asynchronous Receiver Transmitter (UART)” March

1997
Memcs, Inc..“Virtex-4™ FX12 LC Development Board User’s Guide”,2005
51
8 APPENDICES
Appendix A: Background knowledge about filter design
Table 8.1The four classes of FIR filter and their associate properties
Table 8.2 Suitability of a given class of filter for the four FIR types
52
Figure 8.3(a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.(b) Fourier
transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser Windows with β = 6 and N
= 11 (solid), 21 (semi-dashed), and 41 (dashed).
53
Appendix B: filter design by using FDATool
FDATool has been used to entry the design specification Start the FDATool by
entering the fdatool command in the MATLAB Command Window. MATLAB
displays the Filter Design & Analysis Tool dialog.
In the Filter Design & Analysis Tool dialog, check that the following filter options
are set:
Option Value
Response Type Lowpass
Design Method FIR Equiripple
Filter Order Specify order: 5
Options Density Factor: 20
Frequency Specifications Units: KHz

Fs: 10
Fpass: 2
Fstop: 3
54
Magnitude Specifications Units: dB
Apass: 1
Astop: 80
Click the Set Quantization Parameters button in the left-side tool bar. The FDATool
displays a Filter arithmetic menu in the bottom half of its dialog.
Select Fixed-point from the Filter arithmetic list. The FDATool displays the first of
three tabbed panels of quantization parameters across the bottom half of its dialog.
You use the quantization options to test the effects of various settings with a goal of
optimizing the quantized filter's performance and accuracy.
Tab Parameter Setting
Coefficients Numerator word length 16
Best-precision fraction Selected
55
lengths
Use unsigned Cleared

representation
Scale the numerator Cleared

coefficients to fully utilize
the entire dynamic range
Input/Output Input word length 16
Input fraction length 15
Output word length 16
Avoid overflow Selected
Filter Internals Round towards Floor
Overflow mode Saturate
Product mode Full precision
Accum. mode Keep MSB
Accum. word length 40
Cast signals before Selected
accum
Click Apply
Start the Filter Design HDL Coder by selecting Targets–>Generate HDL in the
FDATool dialog. The FDATool displays the Generate HDL dialog.
56
Find the Filter Design HDL Coder online help. Use the help to learn about
product details or to get answers to questions as you work with the designer.
a. In the MATLAB window, click the Help button in the toolbar or click
Help–>Full Product Family Help.
b. In the Help browser's Contents pane, select Filter Design HDL
Coder.
c. Minimize the Help browser.
Click the Help button. The FDATool displays context-sensitive help for the
dialog. As necessary, use the Help button on the other Filter Design HDL Coder
dialogs for context-sensitive help on those dialog views.
Close the Help window.
Place your cursor over the Name label or text box in the HDL filter pane of the
Generate HDL dialog and right-click. A What's This? button appears.
57
Click What's This? The Filter Design HDL Coder opens context-sensitive help
that describes the Name option. Use the context-sensitive help as needed while
using the GUI to configure options that control the contents and style of the
generated HDL code and test bench. A help topic is available for each option and
pane.
In the Name text box of the HDL filter pane, replace the default name with
basicfir. This option names the VHDL entity and the file that is to contain the
filter's VHDL code.
In the Name text box of the Test bench types pane, replace the default name
with basicfir_tb. This option names the generated test bench file.
Click HDL Options. The Filter Design HDL Coder displays an HDL Options
dialog.
In the Comment in header text box, type Tutorial - Basic FIR Filter and then
click Apply. The Filter Design HDL Coder adds the comment to the end of the
header comment block in each generated file.
58
Select the Ports tab. The Ports pane appears.
Change the names of the input and output ports. Replace filter_in with data_in
and filter_out with data_out.
Clear the check box for the Add input register option. The Ports pane should
now look like the following.
Click Apply and then OK to register your changes and close the HDL Options
dialog.
Click Test Bench Options. The Filter Design HDL Coder displays a Test Bench
Options dialog.
59
You use this dialog to customize the generated test bench.
For this tutorial, apply the default settings by clicking OK.
In the Generate HDL dialog, click Generate to start the code generation process.
The Filter Design HDL Coder displays the following messages in the MATLAB
Command Window as it generates the filter and test bench VHDL files.
### Starting VHDL code generation process for filter:

basicfir
### Generating basicfir.vhd file in: hdlsrc
### Starting generation of basicfir VHDL entity
### Starting generation of basicfir VHDL architecture
### HDL latency is 1 samples
### Successful completion of VHDL code generation process
for filter: basicfir
### Starting generation of VHDL Test Bench
### Generating input stimulus
### Done generating input stimulus; length 3429 samples.
### Generating VHDL file basicfir_tb.vhd in: hdlsrc
### Please wait ........
### Done generating VHDL test bench.
As the messages indicate, the Filter Design HDL Coder creates the directory hdlsrc
under your current working directory and places the files basicfir.vhd and
basicfir_tb.vhd in that directory.
The generated VHDL code has the following characteristics:
• VHDL entity named basicfir.
60
• Registers that use asynchronous resets when the reset signal is active
high (1).
• Ports have the following names:
VHDL Port Name
Input data_in
Output data_out
Clock input clk
Clock enable input clk_enable
Reset input
reset
• An extra register for handling filter output.
• Clock input, clock enable input and reset ports are of type
STD_LOGIC and data input and output ports are of type
STD_LOGIC_VECTOR.
• Coefficients are named coeffn, where n is the coefficient number,
starting with 1.
• Type safe representation is used when zeros are concatenated: '0' &
'0'...
• Registers are generated with the statement ELSIF clk'event AND
clk='1' THEN rather than with the rising_edge function.
• The postfix string _process is appended to process names.
The generated test bench:
• Is a portable VHDL file.
• Forces clock, clock enable, and reset input signals.
61
• Forces the clock enable input signal to active high.
• Drives the clock input signal high (1) for 5 nanoseconds and low (0)
for 5 nanoseconds.
• Forces the reset signal for two cycles plus a hold time of 2
nanoseconds.
• Applies a hold time of 2 nanoseconds to data input signals.
• Applies impulse, step, ramp, chirp, and white noise stimulus types.
When you have finished generating code, click Close to close the Generate HDL
dialog.
Low-pass FIR filter has been implemented with
62
Appendix B: Circuit diagram
Figure8-4 Circuit diagram of the overall system
63
Appendix C: Code
16 times baud rate clock
library ieee;
use ieee.std_logic_1164.all;
entity count54 is
port(clk50Mx,en:in std_logic;
clk16x:out std_logic);
end count54;
architecture count54_arc of count54 is

begin
process(clk50Mx,en)
variable count:integer range 0 to 54 :=0;
begin
if en='0' then
clk16x <= '0';
elsif (rising_edge(clk50Mx)) then
count:=count+1;
if count=54 then
clk16x<='1';
count:=0;
else
clk16x<='0';
end if;
end if;
end process;
end count54_arc;
Direct form FIR filter

LIBRARY IEEE;
USE IEEE.std_logic_1164.all;
USE IEEE.numeric_std.ALL;
ENTITY filter IS
PORT( clk : IN std_logic;
clk_enable : IN std_logic;
reset : IN std_logic;
data_in : IN std_logic_vector(7 DOWNTO 0); -- sfix8_En7
data_out : OUT std_logic_vector(7 DOWNTO 0) --
sfix8_En7
);
END filter;
64
----------------------------------------------------------------
--Module Architecture: filter
----------------------------------------------------------------
ARCHITECTURE rtl OF filter IS
-- Local Functions
-- Type Definitions
TYPE delay_pipeline_type IS ARRAY (NATURAL range <>) OF signed(7
DOWNTO 0); -- sfix8_En7
-- Constants
CONSTANT coeff1 : signed(15 DOWNTO 0) := to_signed(-4423,
16); -- sfix16_En16
CONSTANT coeff2 : signed(15 DOWNTO 0) := to_signed(18623,
16); -- sfix16_En16
16); -- sfix16_En16
16); -- sfix16_En16
16); -- sfix16_En16
CONSTANT coeff6 : signed(15 DOWNTO 0) := to_signed(-4423,
16); -- sfix16_En16
-- Signals
SIGNAL delay_pipeline : delay_pipeline_type(0 TO 4); -- sfix8_En7
SIGNAL data_in_regtype : signed(7 DOWNTO 0); -- sfix8_En7
SIGNAL product6 : signed(31 DOWNTO 0); -- sfix32_En31
SIGNAL mul_temp : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL mul_temp_1 : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL sum1 : signed(34 DOWNTO 0); -- sfix35_En31
SIGNAL add_temp : signed(35 DOWNTO 0); -- sfix36_En31
SIGNAL add_temp_1 : signed(35 DOWNTO 0); -- sfix36_En31
SIGNAL output_typeconvert : signed(7 DOWNTO 0); -- sfix8_En7
65
SIGNAL output_register : signed(7 DOWNTO 0); -- sfix8_En7
BEGIN
-- Block Statements
Delay_Pipeline_process : PROCESS (clk, reset)
BEGIN
IF reset = '1' THEN
delay_pipeline(0 TO 4) <= (OTHERS => (OTHERS => '0'));
ELSIF clk'event AND clk = '1' THEN
IF clk_enable = '1' THEN
delay_pipeline(0) <= signed(data_in);
delay_pipeline(1 TO 4) <= delay_pipeline(0 TO 3);
END IF;
END IF;
END PROCESS Delay_Pipeline_process;
data_in_regtype <= signed(data_in);
mul_temp <= delay_pipeline(4) * coeff6;

product6 <= resize( mul_temp & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32);
mul_temp_1 <= delay_pipeline(3) * coeff5;

product5 <= resize( mul_temp_1 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32);



mul_temp_5 <= data_in_regtype * coeff1;

add_temp <= resize(product1, 36) + resize(product2, 36);

sum1 <= (34 => '0', OTHERS => '1') WHEN add_temp(35) = '0' AND
add_temp(34) /= '0'
ELSE (34 => '1', OTHERS => '0') WHEN add_temp(35) = '1' AND add_temp(34)
/= '1'
ELSE (add_temp(34 DOWNTO 0));
add_temp_1 <= resize(sum1, 36) + resize(product3, 36);

sum2 <= (34 => '0', OTHERS => '1') WHEN add_temp_1(35) = '0' AND
add_temp_1(34) /= '0'
ELSE (34 => '1', OTHERS => '0') WHEN add_temp_1(35) = '1' AND
add_temp_1(34) /= '1'
66
ELSE (add_temp_1(34 DOWNTO 0));

add_temp_2(34) /= '0'
add_temp_2(34) /= '1'

add_temp_3(34) /= '0'
add_temp_3(34) /= '1'

add_temp_4(34) /= '0'
add_temp_4(34) /= '1'
output_typeconvert <= (7 => '0', OTHERS => '1') WHEN sum5(34) = '0' AND
sum5(33 DOWNTO 31) /= "000"
ELSE (7 => '1', OTHERS => '0') WHEN sum5(34) = '1' AND sum5(33
DOWNTO 31) /= "111"
ELSE (sum5(31 DOWNTO 24));
Output_Register_process : PROCESS (clk, reset)

BEGIN
IF reset = '1' THEN
output_register <= (OTHERS => '0');
ELSIF clk'event AND clk = '1' THEN
IF clk_enable = '1' THEN
output_register <= output_typeconvert;
END IF;
END IF;
END PROCESS Output_Register_process;
-- Assignment Statements
data_out <= std_logic_vector(output_register);
END rtl;
UART_Receiver
-----------------------------------------------------
library ieee ;
67
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
-----------------------------------------------------
entity rx is
port( Din: in std_logic;
clock: in std_logic;
reset: in std_logic;
Dout: out std_logic_vector(7 downto 0);
Drdy: out std_logic
);
end rx;
-----------------------------------------------------
architecture FSM of rx is
-- define the states of FSM model
type state_type is (S0, S1);

signal next_state, current_state: state_type;
signal tmpdata:std_logic_vector(9 downto 0);
signal datardy:std_logic;
signal count: std_logic_vector(7 downto 0);
signal start: std_logic;
begin
-- cocurrent process#1: state registers

process(clock, reset)
begin
if (reset='1') then
current_state <= S0;
elsif (clock'event and clock='1') then
current_state <= next_state;
end if;
end process;
process(reset,Din,current_state)
begin
if (reset='1') or current_state=S0 then
start <= '0';
elsif current_state = S0 and (Din'event and Din='0') then
start <= '1';
end if;
end process;
process(clock, reset, current_state)
68
begin
if reset='1'or current_state = S0 then
count<="00000000";
elsif(clock'event and clock='1') then
count <= count + 1;
end if;
end process;
process(clock, count, Din,current_state)

begin
next_state<= current_state;
--if next_state<=current_state then
case current_state is
when S0 =>
if start <= '1' then
next_state <= S1;
datardy<='0';
else
next_state <= S0;
end if;
when S1=>
if count=x"18"then
next_state <= S1;
tmpdata(0)<=Din;
datardy<='0';
elsif count=x"28" then

next_state <= S1;
tmpdata(1)<=Din;
datardy<='0';
elsif count=x"38"then
next_state <= S1;
tmpdata(2)<=Din;
datardy<='0';
next_state <= S1;
tmpdata(3)<=Din;
datardy<='0';
next_state<= S1;
tmpdata(4)<=Din;
datardy<='0';
next_state<= S1;
69
tmpdata(5)<=Din;
datardy<='0';
next_state<= S1;
tmpdata(6)<=Din;
datardy<='0';
next_state<= S1;
tmpdata(7)<=Din;
datardy<='0';
dout <= tmpdata(7 downto 0);
next_state<= S1;
datardy<='1';
next_state<= S1;
datardy<='1';
elsif count=x"9D"then
next_state<= S0;
datardy<='0';
else
next_state<= S1;
end if;
when others =>
next_state<= S0;
end case;
-- end if;
end process;
Drdy <= datardy;
end FSM;
UART_Transmitter
library ieee;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity Tx is
port ( clk16 : in std_logic; -- Input clock with 50 MHz clock signal
wr : in std_logic; -- Write strobe signal
rst : in std_logic; -- Reset button
din : in std_logic_vector(7 downto 0); -- 8 Bits data bus
Input
sdo : out std_logic; -- Serial transmit data line
tx_rd : out std_logic);-- Transmitter ready to Tx next byte
70
end Tx;
architecture Tx_arc of Tx is
signal txbr : std_logic_vector(7 downto 0);-- 8 Bits buffer register

signal txsr : std_logic_vector(7 downto 0);-- 8 Bits serial register
signal txd2 : std_logic; -- tag bits for detecting
signal txd1 : std_logic; -- empty shift register
signal clk : std_logic; -- Transmit clock or baud rate clock
signal txdone : std_logic; -- '1' when shifting of byte is done
signal txrd : std_logic; -- '1' when data is ready in txhold
begin
process (rst, clk16) -- Toggle clk every 8 counts, which divides the clock
frequency by 16
variable count: std_logic_vector(2 downto 0);
begin
if rst='1' then
clk <= '0' ;
count := (others=>'0') ;
elsif clk16'event and clk16='1' then
if (count = "000") then
clk <= not clk;
end if;
count := count + "001";
end if;
end process;
process (rst, clk)

begin
if rst='1' then
txsr <= (others=>'0') ;
txd1 <= '0' ;
txd2 <= '0' ;
sdo <= '0' ;
elsif clk'event and clk = '1' then
if (txdone and txrd) = '1' then -- Initialize registers and load next byte of
data
txsr <= txbr; -- Load tx register from txbr
txd2 <= '1'; -- Tag bits for detecting
txd1 <= '1'; -- when shifting is done
sdo <= '0'; -- Start bit to the serial port
else
txsr <= txd1&txsr(7 downto 1); -- Shift data
txd1 <= txd2;
txd2 <= '0';
if txdone = '1' then -- Shift out data
71
sdo <= '1'; -- Stop or idle bit
else
sdo <= txsr(0); -- Shift data bit
end if;
end if ;
end if;
end process;
txdone <= not(txd2 or txd1 or txsr(7) or txsr(6) or txsr(5) or txsr(4) or
txsr(3) or txsr(2) or txsr(1) or txsr(0)); -- txdone = 1 when done
shifting (When txtag2 has reached tx)
process (rst, clk16)

variable wr1,wr2: std_logic; -- write signal delayed 1 and 2 cycles or current
state and preivous state
variable txdone1: std_logic; -- txdone signal delayed one cycle
begin
if rst='1' THEN
txrd<= '0' ;
wr1 := '0' ;
wr2 := '0' ;
txdone1 := '0' ;
elsif clk16'event and clk16 = '1' then
if wr1 = '0' and wr2= '1' then -- Falling edge on write signal. New
data in txbr register
txrd <= '1';
elsif txdone ='0' and txdone1 = '1' then -- Falling edge on txdone signal.
Txbr has been read.
txrd <= '0';
end if;
wr2 := wr1; --
Delayed versions of write and txdone signals for edge detection
wr1 := wr ;
txdone1 := txdone;
end if ;
end process;
txbr <= din when wr = '1'else txbr; -- Latch data bus during write
tx_rd <= not txrd; -- Transmitter ready for write when no data
is in txbr
end Tx_arc;
72

Implementation of Digital Filter by Using FPGA

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Implementation of Digital Filter by Using FPGA

Uploaded by

Copyright:

Available Formats

Department of Electrical and Computer System

Implementation of digital filter by using FPGA

A thesis submitted for the degree of

Bachelor of Engineering in Computer Systems Engineering

Title: Implementation of digital filter by using FPGA

Author: Yishu Wang

Family name: Wang

Given name: Yishu

FPGAs, FIR Filter, Distributed Arithmetic

Good Average Poor

Friday, 10 June 2005

Professor Syed Islam,

In order to partially satisfy the requirements for the Bachelor of Engineering in

I would like to express my appreciation to my project supervisors Dr Yee Hong Leung

through the work.

Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a

Based on the design specification, careful choice of implementation method and

evaluating different implementation alternatives. Proper choice of the

computation algorithms can help the FPGA architecture to make it efficient in

terms of speed and/or area.

Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through

1.1 Why implement the digital filter?

signal, the digital filter performs noiseless mathematical operations at each

option for removing noise, shaping spectrum, and minimizing inter-symbol

interference in communication architectures. These filters have become popular

because their precise reproducibility allows design engineers to achieve performance

levels that are difficult to obtain with analog filters.

example, with a relatively simple computational procedure we may obtain excellent

frequency band undistorted, it is desirable to have approximately constant frequency

desirable to design systems to have exactly or approximately linear phase.

in a straightforward manner, some iteration may be necessary to meet a prescribed

architecture. The hardware designer must choose between area, performance,

MATLAB combines the high-level, mathematical language with an extensive set of

MathWorks, which can generate a behavioral model and coefficient tables.

and FPGAs. (For any details pleaser refer to the chapter 7)

At the heart of the filter algorithm is the multiply-accumulate operation;

design modifications after production makes them less attractive.

tool. (For any details pleaser refer to the appendix B)

To program FPGA, hardware description language is needed.VHDL synthesis offers

an easy way to target a model towards different implementation.

1.3 Design outcome

Computer engineering 203 for the multidisciplinary purpose at the department of

outcome with outcome from other software package.

1.4 Overview of the report

Chapter 5 describes the simulation results in details.

2.1 Hardware Architecture

Deciding on an optimal architecture for a particular application involves tradeoffs

between area, performance, and response. The abundance of dedicated MAC

(Multiply-Accumulate) blocks in the FPGA devices present the option for

implementing either an area efficient “serial” or a high-performance “parallel” FIR

filter or something in between. Serial architectures share a single MAC resource,

Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through

devices. High-performance, high-order filtering applications, that are able to exploit

dedicated multiplier or DSP blocks, often turns to FPGAs for a solution. By

X(n) X(n-1) X(n-2) X(n-3)

X(n) X(n-1) X(n-2)

Figure 2-3 Direct Form FIR Filter Architecture/Symmetric form

X(n) X(n-1) X(n-2)

Figure 2-4 Direct Form FIR Filter Architecture/Anti-Symmetric form

cascaded “multiply-add” operations such as the Xilinx Virtex 4, highest performance

is achieved using a “transposed” architecture. Utilizing the same resources as a

pleaser refer to the chapter 7)

X(n) X(n) X(n) X(n)