You are on page 1of 81

Department of Electrical and Computer System

Engineering

Implementation of digital filter by using FPGA

by

Yishu Wang

A thesis submitted for the degree of

Bachelor of Engineering in Computer Systems Engineering


School of Electrical and Computer Engineering

Title: Implementation of digital filter by using FPGA

Author: Yishu Wang

Family name: Wang

Given name: Yishu

Date Supervisor
Dr Yee Hong Leung, Dr Cesar Ortega-Sanchez

Degree Option
Bachelor of Engineering Computer System
Abstract

Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a formidable task. There
is more than one way to implement the digital FIR filter. Based on the design specification, careful choice
of implementation method and tools can save a lot of time and work. MatLab is an excellent tool to design
filters. There are toolboxes available to generate VHDL descriptions of the filters which reduce
dramatically the time required to generate a solution. Time can be spent evaluating different
implementation alternatives. Computation algorithms are required that exploit the FPGA architecture to
make it efficient in terms of speed and/or area.

Indexing Terms

FPGAs, FIR Filter, Distributed Arithmetic

Good Average Poor

Technical Work

Report Presentation

Examiner Co-Examiner
YishuWang
56 Lowan Loop
Karawara
Western Australia, 6152

Friday, 10 June 2005

Professor Syed Islam,


Head of Department,
Department of Electrical and Computer Engineering,
Curtin University of Technology,
Bentley, 6102, Australia.
Dear Professor Islam

In order to partially satisfy the requirements for the Bachelor of Engineering in


Computer Systems Engineering degree, I present the following thesis entitled
“Implementation of digital filters by using FPGAs”. I declare that this thesis is
entirely my own work except where credited otherwise.

Yours sincerely,

Yishu Wang
ACKNOWLEDGEMENTS

I would like to express my appreciation to my project supervisors Dr Yee Hong Leung

Dr Cesar Ortega-Sanchez and Dr Doug Myers for their patience and helpful suggestions

throughout the life of the project. Their knowledge and support throughout this project

has made this work possible. Their knowledge has proven invaluable in progressing

through the work.

I
Synopsis

Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a

formidable task. There is more than one way to implement the digital FIR filter.

Based on the design specification, careful choice of implementation method and

tools can save a lot of time and work. MatLab is an excellent tool to design filters.

There are toolboxes available to generate VHDL descriptions of the filters which

reduce dramatically the time required to generate a solution. Time can be spent

evaluating different implementation alternatives. Proper choice of the

computation algorithms can help the FPGA architecture to make it efficient in

terms of speed and/or area.

II
INDEX PAGE

TABLES OF FIGURES................................................................................................V
1 INTRODUCTION .................................................................................................1
1.1 Why implement the digital filter? ................................................................ 1
1.2 Design and implementation of digital FIR filter .......................................... 3
1.3 Design outcome............................................................................................ 4
1.4 Overview of the report ................................................................................. 4
2 ISSUES IN IMPLEMENTING THE DIGITAL FILTER .....................................6
2.1 Hardware Architecture ................................................................................. 6
2.2 FIR FILTER IMPLEMENT OPTIONS..................................................... 10
2.2.1 HDL coder.......................................................................................... 11
2.2.2 IP Generators...................................................................................... 11
2.2.3 User Defined ...................................................................................... 15
2.3 Solution to the current practice .................................................................. 16
3 DESIGN SPECIFICATIONS ..............................................................................18
3.1 Expected features of new system ............................................................... 18
3.2 Overview of the system.............................................................................. 19
3.3 Design specification of FIR filters ............................................................. 21
3.4 Overview of UART.......................................................................................... 23
3.4 Design specifications of UART ................................................................. 23
3.5 Design model for transmitter ..................................................................... 25
3.5.1 Design for transmitter ........................................................................ 25
3.5.2 Pseudo code for baud rate clock......................................................... 26
3.5.3 Pseudo code for transmitter................................................................ 26
3.6 Design model for receiver.......................................................................... 28
3.6.1 Design and pseudo code for serial to parallel converter .................... 28
3.6.2 Design for finite state machine and pseudo code for finite state
machine 31
3.6.3 Design and pseudo code for 8 bits counter ........................................ 31
3.6.4 Finite state machine for receiver ........................................................ 32
3.6.5 Pseudo code for receiver .................................................................... 32
3.7 Design for 16 times baud rate clock and it’s pseudo code......................... 35
4 Implementation on board .....................................................................................36
4.1 Overview .................................................................................................... 36
4.2 Onboard peripheral..................................................................................... 37
4.2.1 DDR SDRAM, Flash ......................................................................... 37
4.2.2 Clock Sources .................................................................................... 38
4.2.3 10/100/1000 Ethernet PHY, LCD Panel ............................................ 38
4.2.4 USB 2.0 to RS232 Port ...................................................................... 38
4.2.5 RS232, Configuration and Debug Ports............................................. 39
4.3 Configuration setup.................................................................................... 40
4.4 Connection with PC ................................................................................... 41
5 Simulation Results ...............................................................................................42
5.1 Device utilization summary ....................................................................... 42
5.2 Simulation of the behavior model .............................................................. 48
6 CONCLUSION ....................................................................................................50
7 BIBLIOGRAPHY ................................................................................................51
8 APPENDICES .....................................................................................................52

III
Appendix A: Background knowledge about filter design...................................... 52
Appendix B: filter design by using FDATool........................................................ 54
Appendix B: Circuit diagram................................................................................. 63
Appendix C: Code.................................................................................................. 64

IV
TABLES OF FIGURES

Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through


Figure 2-2 “Unrolled” Direct Form FIR Filter Architecture
Figure 2-3 Direct Form FIR Filter Architecture/Symmetric form
Figure 2-4 Direct Form FIR Filter Architecture/Anti-Symmetric form
Figure 2-5 Transposed Direct form FIR Filter Architecture
Figure 2-6 Serial Distributed Arithmetic FIR Filter Architecture
Figure 2-7 True-down DSP Design Flow
Figure 2-8 Bottlenecks in the DSP Design Process
Figure 2-9 Proper implement techniques choices for the different Filter hardware
architecture
Figure 3.1 Overview of the system
Figure 3-2 Magnitude response of the 8 bit-input and 8 bit-output FIR filter
Figure 3-3 Phase response of the 8 bit-input and 8 bit-output FIR filter
Figure 3-4 Function diagram of the UART
Figure 3-5 Finite state machine of the receiver
Figure 4-1 onboard peripheral
Figure 4-2 USB2.0 to RS-232 port
Figure 4-3 Assign Pins for the designed system
Figure 5-1 Simulation behavior of the UART Transmitter
Figure 5-2 Simulation behavior of the FIR filter
Figure 5-3 Simulation behavior of the16 times baud rate clock
Table 8.1The four classes of FIR filter and their associate properties
Table 8.2 Suitability of a given class of filter for the four FIR types

V
VI
1 INTRODUCTION

1.1 Why implement the digital filter?

Digital filters are used extensively in all areas of electronic industry. This is because

Digital filters have the potential to attain much better signal to noise ratios than

analog filters and at each intermediate stage the analog filter adds more noise to the

signal, the digital filter performs noiseless mathematical operations at each

intermediate step in the transform. As the digital filters have emerged as a strong

option for removing noise, shaping spectrum, and minimizing inter-symbol

interference in communication architectures. These filters have become popular

because their precise reproducibility allows design engineers to achieve performance

levels that are difficult to obtain with analog filters.

FIR and IIR filters are the two common filter forms. A drawback of IIR filters is that the

closed-form IIR designs are preliminary limited to low pass, band pass, and high pass filters,

etc. Furthermore, these designs generally disregard the phase response of the filter. For

example, with a relatively simple computational procedure we may obtain excellent

amplitude response characteristics with an elliptic low pass filter while the phase response

will be very nonlinear. (For any details pleaser refer to the chapter 7)

In designing filters and other signal-processing system that pass some portion of the

frequency band undistorted, it is desirable to have approximately constant frequency

response magnitude and zero phases in that band. For casual systems, zero phases are

not attainable, and consequently, some phase distortion must be allowed. As the

effect of linear phase with integer slope is a simple time shift. A nonlinear phase, on

the other hand, can have a major effect on the shape of a signal, even when the

1
frequency-response magnitude is constant. Thus, in many situations it is particularly

desirable to design systems to have exactly or approximately linear phase.

Compare to IIR filers, FIR filters can have precise linear phase. Also, in the case of FIR

filters, closed-form design equations do not exist. While the window Method can be applied

in a straightforward manner, some iteration may be necessary to meet a prescribed

specification. the window method and most algorithmic methods afford the possibility of

approximating more arbitrary frequency response Characteristics with little more difficulty

than is encountered in the design of low pass filters. Also, it appears that the design problem

for FIR filters is much more under control than the IIR design problem because there is an

optimality theorem for FIR filters that is meaningful in a wide range of practical situations.

The magnitude and phase plots provide an estimate of how the filter will perform;

however, to determine the true response, the filter must be simulated in a system

model using either calculated or recorded input data. The creation and analysis of

representative data can be a complex task. Most of the filter algorithms require

multiplication and addition in real-time. The unit carrying out this function is called

MAC (multiply accumulate). Depends on how good the MAC is, the better MAC the

better performance can be obtained. Once a correct filter response has been determined

and a coefficient table has been generated, the second step is to design the hardware

architecture. The hardware designer must choose between area, performance,

quantization, architecture, and response. (For any details pleaser refer to the chapter 7)

2
1.2 Design and implementation of digital FIR filter

MATLAB combines the high-level, mathematical language with an extensive set of

pre-defined functions to assist in the creation and analysis of filter data. Toolbox are

available for designing filter response and generating coefficient tables, each with

varying levels of sophistication. Graphical filter design tools provide selections for

specifying passband, filter order, and design methods, as well as provide plots of the

response of the filter to various standard forms of inputs. FDAtool from The

MathWorks, which can generate a behavioral model and coefficient tables.

Once a correct filter response has been determined and hardware architecture has

been defined, the implementation can be carried out. Three choices of technology exist

for the implementation of filter algorithms. These are: Programmable DSP chips, ASICs

and FPGAs. (For any details pleaser refer to the chapter 7)

At the heart of the filter algorithm is the multiply-accumulate operation;

Programmable DSP chips typically have only one MAC unit that can perform one MAC

in less than a clock cycle. DSP processors or programmable DSP chips are flexible, but

they might not be fast enough. The reason is that the DSP processor is general purpose

and has architecture that constantly requires instructions to be fetched, decoded and

executed. ASICs can have multiple dedicated MACs that perform DSP functions in

parallel. But, they have high cost for low volume production and the inability to make

design modifications after production makes them less attractive.

FPGAs have been praised for their ability to implement filters since the introduction

of DSP savvy architectures . which can be efficiently realized using dedicated DSP

resources on these devices. More than 500 dedicated multiply-accumulate blocks are

now available, making them exceptionally well suited for high-performance, high-

order filtering applications that benefit from a parallel, non-resource shared hardware

3
architecture. In this particular project, FPGA has been chosen as the implementation

tool. (For any details pleaser refer to the appendix B)

To program FPGA, hardware description language is needed.VHDL synthesis offers

an easy way to target a model towards different implementation.

1.3 Design outcome

The objective of the project is to meet the demand for the development of

courseware in the teaching unit of Digital Signal Processing 304 together with

Computer engineering 203 for the multidisciplinary purpose at the department of

computer and electrical engineering. There are several results has been achieved

through the project life which are: examination of design procedures or design cycle

from different approaches. There is more than one way to implement the digital FIR

filter. The purpose of the project is to identify the different approaches compare and

contrast the different methods, so based on the design specification; careful choice of

implementation method can save designer a lot of time and work. To investigate

relationship among filter order, hardware architecture and FPGA resources usage; to

identify the relationship between the performance and filter hardware architecture;

Using MatLab as major design tool to design filters then compare and contrast the

outcome with outcome from other software package.

1.4 Overview of the report

Chapter 2 covers a brief overview of digital filter design and implementation tools

and methods in use and problems in creating digital filter by using the FPGA board.

Chapter 3 describes the achievement of the project and the methodology has been

used.

4
Chapter 4 describes the implementation of the FIR filter on the FPGA board

Chapter 5 describes the simulation results in details.

Chapter 6 overviews what have been achieved in the project and its future direction.

5
2 ISSUES IN IMPLEMENTING THE DIGITAL FILTER

2.1 Hardware Architecture

Deciding on an optimal architecture for a particular application involves tradeoffs

between area, performance, and response. The abundance of dedicated MAC

(Multiply-Accumulate) blocks in the FPGA devices present the option for

implementing either an area efficient “serial” or a high-performance “parallel” FIR

filter or something in between. Serial architectures share a single MAC resource,

making it very area efficient but at the expense of performance, because one clock-

cycle is required for each tap delay register. This architecture is an excellent choice

for area sensitive, low-performance applications. (For any details pleaser refer to the

chapter 7)

Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through

Although commonly implemented on FPGAs, a serial MAC FIR filter does not

exploit the large number of dedicated hardware MAC units available on the newer

devices. High-performance, high-order filtering applications, that are able to exploit

dedicated multiplier or DSP blocks, often turns to FPGAs for a solution. By

replicating the multiply-adder logic once for each tap delay register, the entire

filtering calculation can be performed in a single clock cycle. With more than 500

6
DSP blocks available, even very large order filters can sustain input sampling rates

over 400 MHz. This type of performance far exceeds even the fastest DSP processors

by orders of magnitude.

Below are the block diagram for an “unrolled” implementation of a direct form FIR

filter and it’s symmetric and anti-symmetric form (For any details pleaser refer to the

chapter 7)

X(n) X(n-1) X(n-2) X(n-3)


Register Register Register
a(0) a(1) a(2) a(3)

y(n)=a(0)(x(n)-x(n-3))+a(1)(x(n-1)-x(n-2))
Figure 2-2 “Unrolled” Direct Form FIR Filter Architecture

X(n) X(n-1) X(n-2)


Register Register Register

a(1)

a(0)

y(n)=a(0)(x(n)+x(n-3))+a(1)(x(n-1)+x(n-2))

Figure 2-3 Direct Form FIR Filter Architecture/Symmetric form

X(n) X(n-1) X(n-2)


Register Register Register

a(1)
a(0)

y(n)=a(0)(x(n)-x(n-3))+a(1)(x(n-1)-x(n-2))

Figure 2-4 Direct Form FIR Filter Architecture/Anti-Symmetric form

7
When targeting an FPGA device with dedicated DSP blocks capable of supporting

cascaded “multiply-add” operations such as the Xilinx Virtex 4, highest performance

is achieved using a “transposed” architecture. Utilizing the same resources as a

“direct” form FIR filter, data samples are applied in parallel to all tap multipliers

through pipeline registers. The products are then applied to a cascaded chain of

registered adders, combining the effect of accumulators and registers. (For any details

pleaser refer to the chapter 7)

X(n)

X(n) X(n) X(n) X(n)

a(3) a(2) a(1) a(0)

y(n)
Register Register Register
a(3)X(n) a(3)X(n-1)+a(2)X(n) a(3)X(n-2)+a(2)X(n-1) y(n)=a(3)X(n-2)+a(2)X(n-1)+a(0)X(n)

Figure 2-5 Transposed Direct form FIR Filter Architecture

A fourth FIR filter architecture worth considering for FIR implementation in FPGAs

is called “distributed arithmetic” (DA). A simplified view of a DA FIR is shown in

Figure 2-8. In its most basic form, DA-based computations are bit-serial in nature –

serial distributed arithmetic (SDA) FIR. These computations can be parallelized

using multiple sets of Look-up table to achieve concurrency. This architecture is

referred to as parallel distributed arithmetic (PDA).

8
Figure 2-6 Serial Distributed Arithmetic FIR Filter Architecture

The advantage of the distributed arithmetic approach is its efficiency. The basic

operations required are a sequence of table look-ups, additions, subtractions, and

shifts of the input data sequence. These functions map efficiently into LUT-based

FPGA, such as Xilinx Virtex II, Virtex 4™ and Altera Stratix™ and Stratix II.

Distributed arithmetic architectures are an excellent choice for high-performance

filter applications targeting FPGA devices that do not include dedicated multipliers

or DSP48 blocks or when these resources are not available. A unique characteristic

of the DA architecture is that the sample rate is decoupled from the filter length,

making the DA architecture appealing for high-order filters. The trade off introduced

Here is one of silicon area (FPGA slices) for performance. As the filter length is

increased in a DA FIR filter, more logic resources are consumed, but throughput and

performance are maintained. The table below summarizes the benefits of each archite

cture.

Filter Architecture Pros Cons

Direct .Area efficient when .Performance decreases on


implemented using shared large order filters
multiply-accumulate
operations .Architecture does not
.Optimal architecture for allow for maximum
devices that offer a efficiency of DSP blocks
dedicated synchronous in the Virtex-4
multiply block such as the architecture
Xilinx Virtex-II and
Virtex-II Pro

9
Transposed  igh-performance when
H  ot as area efficient as
N
mapped to DSP blocks the direct form for small
that support multiply-add order filters
and multiply accumulate
operations such as the
Xilinx Virtex-4 and the
Altera Stratix-II

Distributed Arithmetic  ilter sample rate does


F  erformance diminishes
P
not decrease with an with increase in bit-widths
increase in filter order Consumes more area
Offers highest than direct or transposed
performance in FPGAs forms
when dedicated DSP or
synchronous multiplier
blocks are not available

Table 1 Pros and Cons of the different hardware architecture

Although this discussion on filter hardware architectures has been limited to constant

coefficient, single-rate, single-channel FIR filters, the concepts will apply for

programmable coefficient, multi-rate, and multi-channel variations. The adaptive

filters referenced in the overview, however, have distinct architectural characteristics

that depart significantly from FIR filters.

2.2 FIR FILTER IMPLEMENT OPTIONS

There is more than one way to reach the solution, proper choice of the

implementation tools and techniques can save the designer a lot of work and time.

10
2.2.1 HDL coder

HDL Coder is integrated with the graphical user interface and command line of the

Filter Design Toolbox to provide a unified design and implementation environment.

Filter Design and Analysis Tool (FDATool) or the MATLAB command line can be

used to design a filter and generate VHDL or Verilog code.The design specification

input to the Filter Design HDL Coder is quantized filters that can be create in one of

two ways: Design and quantize the filter with the Filter Design Toolbox, Design the

filter with the Signal Processing Toolbox and then quantize it with the Filter Design

Toolbox.

The Filter Design HDL Coder supports several important filter structures, including:

Direct Form Finite Impulse Response (FIR), Symmetric FIR, Anti-symmetric FIR

Transposed FIR, Direct Form I SOS, infinite Impulse Response (IIR), Direct Form I

Transposed SOS IIR, Direct Form II SOS IIR, and Direct Form II Transposed SOS II

Each of these IIR and FIR structures supports fixed-point and floating-point (double

precision) realizations. In addition, the FIR structures also support unsigned fixed-

point coefficients.

2.2.2 IP Generators

When implementing the actual filter on an FPGA, the designer is presented with a

fundamental choice of whether to use an IP core or design a custom implementation.

Both options have merits and limitations. FIR filter IP cores are readily available

from multiple sources with the most common forms being technology-specific enlists,

synthesizable RTL, and synthesizable MATLAB. All these generators can construct

a filter using a pre-defined coefficient table.

11
Xilinx and Altera both offer high-performance but device-specific filter IP in the

form of technology mapped enlists. This form of IP is typically the most cost

effective and offers the best results but provides the fewest options. Often each core

is hand-crafted to leverage device specific hardware resources, such as Xilinx Block

RAM, and includes placement information which helps maintain performance as

filter size grows. The format of the IP, however, precludes user modification of the

source or retargeting to a different device. The MathWorks offers device independent

filter IP; however, they don’t provide the ability to exploit the high-performing

resources of the FPGA. This form of IP tends to be a bit more expensive but offers

the user the additional options of retarget ability and modification of the RTL source.

The quality of the results will vary with the quality of the RTL model and the RTL

synthesis tool used for implementation. In general, however, some area and

performance is sacrificed to obtain general purpose, retarget able RTL models.

AccelChip offers both high-performance and device-independent IP available in

AccelWare® IP Toolkits. AccelWare IP cores are provided as synthesizable

MATLAB models that offer several advantages. First, the abstraction level of the IP

allows for the greatest range of options. Describing a filter in MATLAB is much

easier than creating a technology-specific placed netlist or RTL. This allows

AccelChip to build a wider range of hardware architecture options into the model,

providing the user the greatest number of choices. Secondly, the architecture of the

final hardware can be further modified during the DSP synthesis process with

AccelChip through the use of “directives.” This provides an efficient means of

leveraging the architectural features of the specific FPGA device, such as dedicated

DSP blocks, RAMs, ROMs, and shift registers, to achieve high quality of results in

12
the target device. The table below summarizes the synthesis directives available in

AccelChip. (For any details pleaser refer to the chapter 7)

AccelChip Synthesis Effect on Results


Directive

Rolling/unrolling of For loops Improves sustainable data throughput


rate by reducing the number of cycles for
processing per input sample

Expansion of vector and matrix Improves sustainable data throughput by


additions and multiplications reducing the number of cycles for
processing per input sample

RAM/ROM memory mapping of Improves FPGA utilization by mapping


1D and 2D arrays 1D and 2D arrays into dedicated FPGA
RAM resources

Pipeline insertion Improves sustainable data throughput


rate by improving clock frequency
performance

Shift register mapping Improves FPGA utilization by mapping


data buffers to shift register logic

IP generators provide an excellent choice when time-to-market or ease-of-use

demands prevail. The general purpose nature of these programmable IP generators,

however, seldom yields the optimal hardware for a specific application. For this, a

user-defined architecture should be considered.

13
Figure 2-7 True-down DSP Design Flow

14
2.2.3 User Defined

FIR filter implementations with aggressive area or performance targets may require

the use of a hand-crafted implementation over a pre-defined IP block. Doing so

allows the designer to exploit application-specific characteristics, such as unique

impulse response, coefficient symmetry, ones in the coefficient table, negative

coefficients, coefficient length, or knowledge about the dynamic range of the input

data to the filter. Most filter IP generators include some simple coefficient symmetry

optimization but do not account for every possible situation.

User-defined implementations have traditionally been achieved using hand-coded

RTL. This approach offers the designer a high degree of control over the

implementation process but is decoupled from the MATLAB simulation environment

and can be time consuming. A typical 32 tap FIR filter Verilog model can range

between 300 – 500 lines of hand-created VHDL or Verilog code and will often

require the instantiation of FPGA-specific RAMs or DSP blocks to achieve optimal

results.

The quantization must be determined for each signal and register in the design,

which typically requires a re-write of the original floating-point model in language

that supports fixed-point modeling and filter response analysis. Once complete, the

final RTL code must be verified back to the original filter model to insure functional

correctness. A recent survey conducted by AccelChip found that designers were

evenly split between these major DSP design tasks when asked to identify their most

significant bottlenecks.

15
Figure 2-8 Bottlenecks in the DSP Design Process

2.3 Solution to the current practice

Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a

formidable task. There is more than one way to implement the digital FIR filter.

Based on the design specification, careful choice of implementation method and

tools can save a lot of time and work. MatLab is an excellent tool to design filters.

There are toolboxes available to generate VHDL descriptions of the filters which

reduce dramatically the time required to generate a solution. Time can be spent

evaluating different implementation alternatives. Proper choice o f computation

algorithms can improve the FPGA architecture to make it efficient in terms of

speed and/or area.

16
FIR filter Design Methode

Design Tool (MATLAB, MATCAD....) Implemenation Tool(Xilinx, Altera...)

FDATool Look-up Hardware Description C Language


Table Language
FVTool Accelware
Generations
Multi-Rate FIR filter
(Idea form) Distribute Sym FIR using (SDA)
Direct Arithmetic Non-Sym FIR using (SDA)
Direct-Transposed Multi-Rate FIR filter Sym FIR using (PDA)
Symmetric (User defined input data) Non-Sym FIR using (PDA)
Anti-Symmetric

Figure 2-9 Proper implement techniques choices for the different Filter hardware architecture

Iinvestigation through the first phase of the project has found that, in the current

market there is so many design, implementation tools, each of them has the

advantage and disadvantage in some aspects .Conclusion can be draw from the

investigation is that to implement the linear phase filter MATLAB or MATCAD

is the best choice as there are several toolbox available which remove the

complexity for the design and implementation. To design and implement the

Multi-rate FIR filter, Filter visualisation tool from MATLAB and Accelware

generations software package is the best choice. For the distribute arithmetic

hardware architecture the best way to implement it is to use the look-up table

technique also distribute arithmetic algorithm is the best choice for implementing

FIR filters with Xilinx FPGAs. One of these implementations, namely, Serial

Distributed Arithmetic (SDA) exhibits efficiency in terms of area. The other

implementation, named Parallel Distributed Arithmetic (PDA) shows a very high

sample rate. Linear-phase response FIR filters were also developed and

implemented in an FPGA using the DA technique.

17
3 DESIGN SPECIFICATIONS

3.1 Expected features of new system

The objective of this system is to provide a hardware platform to test FIR filters that

have been generated using MATLab. The system performs the following functions:

1. Communicate with a PC using a standard RS-232 serial interface.

2. Receive a data file from the PC. Data will be received in binary form so they

can be applied to the FIR filter without further transformation. For this

exercise consider that the FIR filter receives 8-bit samples.

3. The outputs of the FIR filter must be returned to the PC where they will be

stored in a file. Again assume data from the FIR filter have to be sent the

outputs in binary form (without transformation).

4. To send and receive the files from/to the PC a program like “HyperTerminal”

can be used. The serial port must be configured with the following settings:

Bauds rate is 57600 which is 54 times slower than the 50MHz crystal clock

onboard;1 start bit, 1 stop bit ,8 data bits, No parity ,No flow control.

18
3.2 Overview of the system

8
DRdy

PC RS-232
interf UART FIR filter
ace 8
WrD EOF

50 MHz
Clock for on- board
Clock
baud rate oscillato
divider
r
FPGA development board

Figure 3.1 Overview of the system

In above figure, there are two blocks that do not have to be designed: the RS-232

interface, it is provided as part of the development board, and the FIR filter, it is

implemented using MATLab.

The blocks that have to be developed are the Universal Asynchronous Receiver-

Transmitter (UART), the clock divider. A brief description of each one of these

blocks provided next.

UART must receive and transmit data in a serial format. Every frame that is

transmitted contains 1 start bit, 8 bits of information and 1 stop bit. The function of

the UART can be divided two: the transmission and the reception circuits. In this

particular application, the FIR filter receives 8-bit numbers at its input. Hence, it is

possible to feed the filter directly from the UART.

The receiver must perform the following functions:

• Detect the start of a frame by sensing the start bit.

19
• Sample the 8 bits of information at the frequency corresponding to the baud

rate, least significative bit first.

• Present the 8 bits of information as one byte to the FIR filter.

• Generate a signal DRdy to latch the new data into the filter. This signal must

work as a “write” signal to the filter.

The transmitter must perform the following functions:

• Receive 8-bit data from the FIR filter. Data will be latched in the transmitter

using the WrD signal generated by the data manager.

• Transmit the data in serial format at the baud rate, least significant bit first. 1

start bit at the beginning and 1 stop bit at the end of the frame must be

inserted

Clock divider in this circuit must divide the 50MHz clock available on the board to

deliver a frequency related to the baud rate. This new clock has to be used to receive

and send data over the serial channel at the correct speed.

FIR filter with its function is to receive 8 bit data from the UART and apply the

filtering process for the incoming data. This circuit must provide an interface

compatible with the UART. In figure above, this interface is represented by the

signal EOF (End Of Filtering) that indicates when the filter has a new data available

on its output.

20
3.3 Design specification of FIR filters

Low-pass FIR filter has been implemented with


• Response type: Low pass;

• Design method: Equi ripple;

• Density factor: 20;

• Filter order: order 5;

• Hardware architecture: Direct form;

• Sampling frequency: 48000Hz;

• Pass band frequency: 9600Hz;

• Stop band frequency: 12000Hz;

• Input data length: 8 bits;

• Output data length: 8 bits;

Figure 3-2 Magnitude response of the 8 bit-input and 8 bit-output FIR filter

21
Figure 3-3 Phase response of the 8 bit-input and 8 bit-output FIR filter

22
3.4 Overview of UART

Figure 3-4 Function diagram of the UART

The UART is for interfacing computers or microprocessors to an asynchronous serial

data channel. The receiver converts serial start, data, parity and stop bits. The

transmitter converts parallel data into serial form and automatically adds start, parity

and stop bits. The data word length can be 5, 6, 7 or 8 bits. Parity may be odd or even.

Parity checking and generation can be inhibited. The stop bits may be one or two or

one and one-half when transmitting 5-bit code. It can be used in a wide range of

applications including modems, printers, peripherals and remote data acquisition

systems. (For any details pleaser refer to the chapter 7)

3.4 Design specifications of UART

In the UART (Universal Asynchronous Receive and Transmission) all data

transmit or received in unit of bit, at the beginning and the end of the data frame

there are the start bit and the stop bit respectively.

23
Figure 3-4 Flow chart representation of the system

Start bit is logic 0, it always stays in the front of the data frame in order to remain

the receiver to receive the data, in the process of the receiving the start bit will be

removed. According to the TCP/IP, it allows to have the data frame within the

range of the 5 to 8. Usually, the data frame is7 or 8 bits, if it required to

transmitting ASCII data (Assume all data in binary number), in this design the

data frame is in the length of the 8 bits. When the data was transmitted start from

the LSB and the MSB is at the end of the data frame. Such as the letter C in the

ASCII code it can be represented in 67 in decimal number and 01000011 in the

binary number, when it started to transmitted it will become 11000010. There is

something wrong with the data frame. Stop bit is logic 1, it always stays at the

end of the data frame, and it can be 1 bit, 1.5bit or 2 bits. In the project, it is 1 bit.

And there is no parity bit and framing error bit. Baud Rate is the number of the

bits can be received and transmitted within a second. If the time need for

transmitting and receiving one bit is t then the baud rate will be 1 over t, the

corresponding transmit and receive clock is 1/t Hz. The baud rate for the

transmitter and receiver should be consistent otherwise the error can be occurred,

24
in this particular design the baud rate has been assumed as 54 times slower than

the clock signal on the board.

3.5 Design model for transmitter

To simplify the design for the project, there is a start bit, 8 bits data in length and a

stop bit and there is no parity bit, assume the baud rate is 54 times slower than the

clock signal from the board.

3.5.1 Design for transmitter

According to design specification, it needs to send 10bits data (1 start bit, 8 bits data,

1 stop bit), after the stop bits, the transmitter will stop and the logic level will stay

back to logic1, then it will stay in idle state to wait for the next transmission. The

transmitter section accepts parallel data, formats the data and transmits the data in

serial form on the serial data output (sdo) terminal. Data is loaded from the inputs din

7 –din0 into the transmitter buffer register by applying logic high on the write input

port. Valid data must be present at least tset prior to and thold following the rising

edge of write signal. If words less than 8 bits are used, only the least significant bits

are transmitted. The character is right justified, so the least significant bit

corresponds to txbr(0). The rising edge of write signal clears transmitter buffer

register. 0 to 1 Clock cycles later, data is transferred to the transmitter register, the

tx_rd pin goes to a low state, tx_rd is set high and serial data information is

transmitted. The output data is clocked at a clock rate 16 times the data rate. A

second high level pulse on write loads data into the transmitter buffer register. Data

transfer to the transmitter register is delayed until transmission of the current data is

complete. Data is automatically transferred to the transmitter register and

25
transmission of that character begins one clock cycle later.

3.5.2 Pseudo code for baud rate clock

If the logic level 1 at the input port reset has been detected then

The clk signal is logic 0

Variable count has been initialized to 000

Elsif there is a rising edge of the 16 times faster baud rate clock then

If the variable count is in the initial state 000 then

Invert the clk signal

Variable count increment by 1

3.5.3 Pseudo code for transmitter

If the logic level 1 at the input port reset has been detected then

The clk signal is logic 0

Variable count has been initialized to 000

Elsif there is a rising edge of the 16 times faster baud rate clock then

If the variable count is in the initial state 000 then

Invert the clk signal

Variable count increment by 1

If the logic level 1 at the input port reset has been detected then

Signal of the 8 bits serial register is 00000000

Signal of the empty shift register is logic 0

Signal of tag bits for detecting of the start bit is logic 0

Output of the transmitter is logic 0

Elsif there is rising edge of the baud rate clock then

26
If shifting of the byte is done and data is ready in txhold then

Load transmitter register from 8 bits buffer

Tag bits for detecting is on

Empty shift register is on

Send logic 0 start bit into the output port sdo

Else

Shift data

If one byte of data has been shift out then

Shift out the stop bit or idle state

Else

Shift the least significant bit of the serial register txsr(0)

If the logic level 1 at the input port reset has been detected then

Send the logic 0 into the signal txrd

Variable write signal delayed 1 wr1 is logic 0

Variable write signal delayed 2 wr2 is logic 0

Variable txdone1 is logic 0

Elsif there is a rising edge of the16 times faster baud rate clock then

If there is falling edge on write singal

Signal txrd is in the high state

Elsif there is a falling edge on txdone signal( Txbr has been read)

Send the logic 0 into the signal txrd

Variable wr2 an variable wr1 is equal

Send the write strobe signal from input port wr to the varible wr2

Send the logic level of the txdone to the variable txdone1

Latch data bus during write

27
Transmitter ready for write when no data is in the transmitter buffer register

3.6 Design model for receiver

Data is received in serial form at the input port Din. When no data is being received,

Din must remain high. The data is clocked through the Clock. The clock rate is 16

times the data rate.

The UART receiver is implemented as a structural model. Some parts of it are:

1) Finite state machine

2) 8 bits counter

3) Serial to parallel converter

3.6.1 Design and pseudo code for serial to parallel converter

The serial to parallel converter is controlled by finite state machine which has only

two states. State S0 is idle and S1 is data tapping state. Output of FSM is control

signal for serial to parallel converter which uses a counter which counts till certain

states to keep track of data and simultaneously tapping data in an 10 bit logic vector.

After tapping of tenth bit the counter signals to finite state machine back to state S1.

When the current state is state S0

If the start bit of the incoming data has been detected then

Go the next state S1

The datardy signal is in the lower state 0

If the start bit of the incoming data has not been detected or idle state

The next state is state S1

The datardy signal is in the lower state 0

28
When the current state is in the state S1 then

If the counter is in the state hex number 18 then

Sampling the start bit into the tmpdata register in bit tmpdata(0);

The datardy signal is in the lower state 0

The next sate is state S1

ElsIf the counter is in the state hex number 28 then

Sampling the incoming data Din into the tmpdata register in bit tmpdata(1);

The datardy signal is in the lower state 0

The next sate is state S1

ElsIf the counter is in the state hex number 38 then

Sampling the incoming data Din into the tmpdata register in bit tmpdata(2);

The datardy signal is in the lower state 0

The next sate is state S1

ElsIf the counter is in the state hex number 48 then

Sampling the incoming data Din into the tmpdata register in bit tmpdata(3);

The datardy signal is in the lower state 0

The next sate is state S1

ElsIf the counter is in the state hex number 58 then

Sampling the incoming data Din into the tmpdata register in bit tmpdata(4);

The datardy signal is in the lower state 0

The next sate is state S1

ElsIf the counter is in the state hex number 68 then

Sampling the incoming data Din into the tmpdata register in bit tmpdata(5);

The datardy signal is in the lower state 0

The next sate is state S1

29
ElsIf the counter is in the state hex number 78 then

Sampling the incoming data Din into the tmpdata register in bit tmpdata(6);

The datardy signal is in the lower state 0

The next sate is state S1

ElsIf the counter is in the state hex number 88 then

Sampling the incoming data Din into the tmpdata register in bit tmpdata(7);

The datardy signal is in the lower state 0

Output the data from the buffer register into the output port

The next sate is state S1

ElsIf the counter is in the state hex number 95 then

The datardy signal is in the high state 1

The next sate is state S1

ElsIf the counter is in the state hex number 98 then

The datardy signal is in the high state 1

The next sate is state S1

ElsIf the counter is in the state hex number 9D then

The datardy signal is in the low state 0

Go back to the state S0

Else the counter is in the other state

Go back to the state S0

30
3.6.2 Design for finite state machine and pseudo code for finite state machine

The finite state machine is triggered from state S0 to state S1 when there is a rising

edge of the clock signal; it stays at state S0 when the reset signal is at the higher level.

If the signal at the input port reset has been scanned as logic 1 then

The current state is state S0

If there is a rising edge of the clock signal then

The current change to the next state S1

3.6.3 Design and pseudo code for 8 bits counter

The 8 bits counter signals at the predefined state; the 8 bit counter uses this signal as

its clock to tap data. It is counting 11 such signals from 8 bit counter.

If the logic level at the reset input port has been scanned as the logic 1 or the current

state is in the state S0

Then counter initialize itself to the “00000000”

Elsif there is the rising edge of the clock signal the

Counter increment by 1

31
3.6.4 Finite state machine for receiver

If there is falling edge of the incoming data Din

Send data Do nothing


S0
S10
S1

Load the the incoming data to the bre(8) S9

S2 Load the the incoming data to the bre(1)

S8

Load the the incoming data to the bre(7) Load the the incoming data to the bre(2)
S3

S7

Load the the incoming data to the bre(6)


S4 Load the the incoming data to the bre(3)
S6

S5
Load the the incoming data to the bre(5) Load the the incoming data to the bre(4)

Figure 3-5 Finite state machine of the receiver

3.6.5 Pseudo code for receiver

If the current state is state S0 and If there is the falling edge of the

incoming data has been detected (start bit 0 has been found) then

Go to the state S1

Else stay in the state S0 wait until the start bit has been detected

When the current the state is in state S1 then

If the state control signal hex number 18 has been detected

Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 0

Apply the signal logic 0 to Drdy

When the current the state is in state S1 then

If the state control signal hex number 28 has been detected

32
Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 1

Apply the signal logic 0 to Drdy

When the current the state is in state S1 then

If the state control signal hex number38 has been detected

Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 2

Apply the signal logic 0 to Drdy

When the current the state is in state S1 then

If the state control signal hex number 48 has been detected

Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 3

Apply the signal logic 0 to Drdy

When the current the state is in state S1 then

If the state control signal hex number 58 has been detected

Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 4

Apply the signal logic 0 to Drdy

When the current the state is in state S1 then

If the state control signal hex number 68 has been detected

Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 5

Apply the signal logic 0 to Drdy

When the current the state is in state S1 then

If the state control signal hex number 78 has been detected

33
Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 6

Apply the signal logic 0 to Drdy

When the current the state is in state S1 then

If the state control signal hex number 88 has been detected

Then stay in the state S1

Sampling the incoming data into the tmpdata register tmpdata bit 7

Apply the signal logic 0 to Drdy

Output the all sampling data to the port dout

When the current the state is in state S1 then

If the state control signal hex number 95 has been detected

Then stay in the state S1

Apply the signal logic 1 to Drdy

When the current the state is in state S1 then

If the state control signal hex number 98 has been detected

Then stay in the state S1

Apply the signal logic 1 to Drdy

When the current the state is in state S1 then

If the state control signals hex number 9D has been detected

Then go back the state S0

Apply the signal logic 0 to Drdy

Else

Go back to the state S1

When the State is not state S0 or S1 then

Go back to the state S0

34
3.7 Design for 16 times baud rate clock and it’s pseudo code

To generate baud rate of 16 times faster than the baud rate of 57600 frequency of

onboard clock needs to be divided by (16*57600 = 921600) which is 54.2534.

After rounding it becomes integer number 54.

If the signal 1 has been scanned at the input port en

If there is rising edge of the 50MHz clock

Then count incremented by 1

If there is 54 pulses has been passed

Then the output the logic 1

And the variable count has been initialized to 0

Else output logic 0

35
4 Implementation on board

4.1 Overview

The Virtex-4™ FX12 LC Development kit provides a complete development

platform for designing and verifying applications based on the Xilinx Virtex-4 FPGA

family. This kit enables designers to implement DSP and embedded processor based

applications with extreme flexibility using IP cores and customized modules. The

Virtex-4 FPGA along with its integrated PowerPC processor core makes it possible

to prototype processor based applications, enabling software designer early access to

a hardware platform prior to working with the final product/target board. The Virtex-

4 FX12 LC system board utilizes the Xilinx XC4VFX12-10FF668C FPGA. The

board includes 64MB of DDR SDRAM, 4MB of Flash, USB-RS232 Bridge, a

10/100/1000 Ethernet PHY, 100 MHz clock source, RS-232 port, and additional user

support circuitry to develop a complete system. The board also supports the P160

expansion module standard, allowing application specific expansion modules to be

easily added.

36
4.2 Onboard peripheral

Figure 4-1 Onboard peripheral

4.2.1 DDR SDRAM, Flash

The Virtex-4™ FX12 LC development board provides 64MB of DDR SDRAM

memory (x16). A high-level block diagram of the DDR SDRAM interface is shown

below followed by a table describing the SDRAM memory interface signals. The

Virtex-4™ FX12 LC development board provides 4MB of flash memory (x16). A

high-level block diagram of the flash interface is shown below followed by a table

describing the flash memory interface signals.

37
4.2.2 Clock Sources
The Clock Generation section of the Virtex-4 FX12 LC board provides all the

necessary clocks for the PowerPC processor, the I/O devices located on the board, as

well as the DDR SDRAM memory. An on-board 100MHz oscillator provides the

system clock input to the processor section. This 100Mhz clock will be used by the

Virtex-4 Digital Clock Managers (DCMs) to generate various processor clocks. In

addition to the above clock inputs, a socket is provided on the board that can be used

to provide single ended LVTTL clock input to the FPGA via an 8 or 4-pin oscillator.

4.2.3 10/100/1000 Ethernet PHY, LCD Panel

The Virtex-4 FX12 LC development board provides a 10/100/1000 Ethernet port for

network connection. This interface uses the Virtex-4 embedded 10/100/1000 MAC.

A high-level block diagram of the 10/100/1000 Ethernet interface is shown in the

following figure followed by FPGA pin assignments for this interface.

The Virtex-4 FX12 LC development board provides an 8-bit interface to a 2x16 LCD

panel (MYTECH MOC-16216B-B). The following table shows the LCD interface

signals.

4.2.4 USB 2.0 to RS232 Port


The Virtex-4 FX12 LC development board implements a USB 2.0 port. This is

accomplished using the Cygnal CP2101 USB-to-UART Bridge Controller. The

FPGA interfaces to the CP2102 as a simple UART. The UART interface to the

CP2102 can run at speeds ranging from 300 to 921,600 baud. The CP2102 is a highly

integrated USB-to-UART Bridge Controller, providing a simple solution for USB

serial communications using a minimum of components and PCB space. The

CP2102 includes a USB 2.0 full-speed function controller, USB transceiver,

oscillator, EEPROM, and asynchronous serial data bus (UART) with full modem

38
control signals in a compact 5mm X 5mm MLP-28 package. No other external USB

components are required. The on-chip EEPROM may be used to customize the USB

Vendor ID, Product ID, Product Description String, Power Descriptor, Device

Release Number, and Device Serial Number as desired. The EEPROM is

programmed on-board via the USB allowing the programming step to be easily

integrated into the product manufacturing and testing process. Royalty-free Virtual

COM Port (VCP) device drivers provided by Cygnal allow the Virtex-4 FX12 LC

development board to appear as a COM port to PC applications. The CP2102 UART

interface implements all RS232 signals, including control and handshaking signals.

These signals are interfaced to the Virtex-4 FPGA as follows:

Figure 4-2 USB2.0 to RS-232 port

4.2.5 RS232, Configuration and Debug Ports

The Virtex-4 FX12 LC development board provides an RS232 interface with RX and

TX signals and jumpers for connecting the RTS and CTS signals. The following

figure shows the RS232 interface to the Virtex-4 FX12 FPGA.Various methods of

configuration and debug support are provided on the Virtex-4 FX12 development

board to assist designers during the testing and debugging of their applications. The

following sections provide brief descriptions of each of these interfaces.

39
4.3 Configuration setup

The configuration for implementation on the board has been set up as the following.

For the 50MHz clock onboard Pin 184 has been assigned For the USB switch P16

has been assigned. For the reset button Pin 22 has been assigned. For the serial data

input into the receiver Pin11 has been assigned. For the serial data output from the

transmitter Pin 3 has been assigned. (For any details pleaser refer to the chapter 7 for the

reference reading)

Figure 4-3 Assign Pins for the designed system

40
4.4 Connection with PC

Design and implementation needs to be done at the host computer system before it

downloads the configuration file into the FPGA. Connection needs to be made

between the host computer system and target system by using connection cable.

Figure 4-3 Connection between the host system and the target system

A terminal program called hype terminal is an application that will enable a PC to

communicate directly with a modem. This can be useful to test the overall system

design and diagnose problems. For this particular design the baud rate for the

HyperTerminal has been setup as 57600 baud rate. (For any details pleaser refer to the

chapter 7 for the reference reading)

41
5 Simulation Results

5.1 Device utilization summary

As one of the objectives of the project is to investigate the resources usage inside of

FPGA, simulation result shown that the amount of the component has been used in

the FPGA as shown in the following diagram for the linear phase FIR filter for the

order 5, order 10 and order 15.

Direct FIR Order 5


Selected Device : 3s400pq208-4
Number of Slices: 143 out of 3584 3%
Number of Slice Flip Flops: 131 out of 7168 1%
Number of 4 input LUTs: 172 out of 7168 2%
Number of bonded IOBs: 54 out of 141 38%
Number of MULT18X18s: 6 out of 16 37%
Number of GCLKs: 1 out of 8 12%

Transposed FIR Order 5


Selected Device : 3s400pq208-4
Number of Slices: 117 out of 3584 3%
Number of Slice Flip Flops: 221 out of 7168 3%
Number of 4 input LUTs: 173 out of 7168 2%
Number of bonded IOBs: 54 out of 141 38%
Number of MULT18X18s: 3 out of 16 18%
Number of GCLKs: 1 out of 8 12%

Symmetric FIR Order 5


Selected Device : 3s400pq208-4
Number of Slices: 117 out of 3584 3%
Number of Slice Flip Flops: 131 out of 7168 1%
Number of 4 input LUTs: 120 out of 7168 1%
Number of bonded IOBs: 55 out of 141 39%
Number of MULT18X18s: 3 out of 16 18%
Number of GCLKs: 1 out of 8 12%

Anti-Symmetric FIR Order 5


Selected Device : 3s400pq208-4
Number of Slices: 143 out of 3584 3%
Number of Slice Flip Flops: 131 out of 7168 1%
Number of 4 input LUTs: 172 out of 7168 2%

42
Number of bonded IOBs: 54 out of 141 38%
Number of MULT18X18s: 6 out of 16 37%
Number of GCLKs: 1 out of 8 12%

Direct FIR order 10


Selected Device : 3s400pq208-4
Number of Slices: 206 out of 3584 5%
Number of Slice Flip Flops: 212 out of 7168 2%
Number of 4 input LUTs: 210 out of 7168 2%
Number of bonded IOBs: 55 out of 141 39%
Number of MULT18X18s: 7 out of 16 43%
Number of GCLKs: 1 out of 8 12%

Transposed FIR Order 10


Selected Device : 3s400pq208-4
Number of Slices: 213 out of 3584 5%
Number of Slice Flip Flops: 398 out of 7168 5%
Number of 4 input LUTs: 213 out of 7168 2%
Number of bonded IOBs: 55 out of 141 39%
Number of MULT18X18s: 4 out of 16 25%
Number of GCLKs: 1 out of 8 12%

Symmetric FIR Order 10


Selected Device : 3s400pq208-4
Number of Slices: 180 out of 3584 5%
Number of Slice Flip Flops: 212 out of 7168 2%
Number of 4 input LUTs: 156 out of 7168 2%
Number of bonded IOBs: 56 out of 141 39%
Number of MULT18X18s: 4 out of 16 25%
Number of GCLKs: 1 out of 8 12%

Anti-Symmetric FIR Order 10


Selected Device : 3s400pq208-4
Number of Slices: 188 out of 3584 5%
Number of Slice Flip Flops: 212 out of 7168 2%
Number of 4 input LUTs: 174 out of 7168 2%
Number of bonded IOBs: 55 out of 141 39%
Number of MULT18X18s: 6 out of 16 37%
Number of GCLKs: 1 out of 8 12%

Direct FIR Order 15


Selected Device : 3s400pq208-4
Number of Slices: 413 out of 3584 11%
Number of Slice Flip Flops: 292 out of 7168 4%
Number of 4 input LUTs: 534 out of 7168 7%

43
Number of bonded IOBs: 55 out of 141 39%
Number of MULT18X18s: 16 out of 16 100%
Number of GCLKs: 1 out of 8 12%

Transposed FIR Order 15


Selected Device : 3s400pq208-4
Number of Slices: 298 out of 3584 8%
Number of Slice Flip Flops: 587 out of 7168 8%
Number of 4 input LUTs: 537 out of 7168 7%
Number of bonded IOBs: 55 out of 141 39%
Number of MULT18X18s: 8 out of 16 50%
Number of GCLKs: 1 out of 8 12%

Symmetric FIR Order 15


Selected Device : 3s400pq208-4
Number of Slices: 346 out of 3584 9%
Number of Slice Flip Flops: 293 out of 7168 4%
Number of 4 input LUTs: 389 out of 7168 5%
Number of bonded IOBs: 56 out of 141 39%
Number of MULT18X18s: 8 out of 16 50%
Number of GCLKs: 1 out of 8 12%

Anti-Symmetric FIR Order 15


Selected Device : 3s400pq208-4
Number of Slices: 413 out of 3584 11%
Number of Slice Flip Flops: 292 out of 7168 4%
Number of 4 input LUTs: 534 out of 7168 7%
Number of bonded IOBs: 55 out of 141 39%
Number of MULT18X18s: 16 out of 16 100%
Number of GCLKs: 1 out of 8 12%

From the observation simulation result the conclusions can be drawn is that for the

same filter order, direct hardware architecture takes larger resources than the

transposed architecture. Anti-Symmetric hardware architecture takes larger resources

than the symmetric hardware architecture as symmetric hardware architecture shares

some component in common so the result of the simulation is reasonable.

44
Resources Usage for Direct FIR filter

600

500

400

300

200
Order 5

Order 10
100

Order 15
0
Slice FFs LUTs IOBs MULT GCLKs

Observation from the graph has found that for direct form FIR filter architecture as

the filter order increases the amount of the resources usage is increasing correspond.

Resources Usage for Transposed form FIR filter

600

500

400

300

200 Order 5

100
Order 10

Order 15
0
Slice FFs LUTs IOBs MULT GCLKs

45
Observation from the graph has found that for transposed form of FIR filter

architecture as the filter order increases the amount of the resources usage is

increasing correspond.

Resources Usage for Symmetric FIR filter

400

350

300

250

200

150
Order 5
100
Order 10
50
Order 15
0
Slice FFs LUTs IOBs MULT GCLKs
Observation from the graph has found that for Symmetric form FIR filter architecture

as the filter order increases the amount of the resources usage is increasing

correspond.

Resources Usage for Anti-Symmetric FIR filter

600

500

400

300

200 Order 5

100 Order 10

Order 15
0
Slice FFs LUTs IOBs MULT GCLKs

46
Observation from the graph has found that for Anti-symmetric form FIR filter

architecture as the filter order increases the amount of the resources usage is

increasing correspond.

47
5.2 Simulation of the behavior model

The simulation result for the UART transmitter is correct and the behaviour model

has met the design requirement as the result shown that the most significant bit

received from the parallel port of the 8 bits register now is becoming the least

significant bit and the least significant bit received form the parallel port of the 8 bits

register now is becoming the most significant bit and so on. There is no parity check

bit and frame error bit.

Figure 5-1 Simulation behavior of the UART Transmitter

It is really hard to judge the behaviour of the FIR low pass filter form this simulation

data however the simulation behaviour is generated based on the design specification

in section 3.3 so the behaviour model should be correct.

48
Figure 5-2 Simulation behavior of the FIR filter

The simulation result for the 16 times baud rate clock is correct as it can be seen that

there is only one pulse signal for the clk16x only when 54 50MHz pulses passed so it

is correct.

Figure 5-3 Simulation behavior of the16 times baud rate clock

The simulation result for the receiver is correct.

Figure 5-4 Simulation behavior of the UART receiver

49
6 CONCLUSION

The traditional methodology for designing a filter consists of two phases: system

specification and hardware implementation. Both phases require multiple iterations

and can involve a four-to-six-week process just to ensure that a single functional

block operates to specification within the system. Creating this filter from the ground

up, using either VHDL or other design entry methods, would have required too much

development time. MatLab is an excellent tool to design filters. There are toolboxes

available to generate VHDL descriptions of the filters which reduce dramatically the

time required to generate a solution.

There is more than one way to implement the digital FIR filter. Based on the design

specification, careful choice of implementation method and tools can save a lot of

time and work. Time can be spent evaluating different implementation alternatives.

Proper computation algorithm can improve the FPGA architecture to make it

efficient in terms of speed and/or area.

As the filter order increase the amount of resources usage increase correspondently,

Non- Symmetric FIR filter took larger amount of resources than symmetric FIR filter

as the Amount of hardware component to implement the Non-Symmetric FIR filter is

bigger than the symmetric FIR filter.

About future work for the improvement of the project, digital to analogue converter,

analogue to digital converter and IIR filter can be tried to implement on the FPGA

board.

50
7 BIBLIOGRAPHY

Abdelhak M. Zoubir “Digital Signal Processing and Spectral Analysis”, Manuscript


for DSP 304, pp18, 60, 75, 1999

AccelChip, Inc., “AccelWare IP for DSP Design Targeting FPGAs and ASICs” pp5,
2004

AccelChip, Inc., “Comparison of Methods for Implementing DSP Algorithms ” pp. 5,


July 2004.

AccelChip, Inc., “Filter Design Methods for FPGAs” pp. 1-10, 2004.

AccelChip, Inc., “Automated Conversion of Floating-point to Fixed-point


MATLAB” pp.6 2004.

AccelChip, Inc., “What’s the Right Language for DSP System-level


Design?”pp3,2004

AccelChip, Inc., “MATLAB as a System Modeling Language for Hardware,” pp9,


2004.

AccelChip, Inc., “Top-Down DSP Design Flow to Silicon Implementation,” pp6-8,


2004.

Nirman Kendra, “SI40UART01 Universal Asynchronous Receiver/Transmitter Core


Factsheet” 12/20/2002

Intersil, Inc..“CMOS Universal Asynchronous Receiver Transmitter (UART)” March


1997

Memcs, Inc..“Virtex-4™ FX12 LC Development Board User’s Guide”,2005

51
8 APPENDICES

Appendix A: Background knowledge about filter design

Table 8.1The four classes of FIR filter and their associate properties

Table 8.2 Suitability of a given class of filter for the four FIR types

52
Figure 8.3(a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.(b) Fourier
transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser Windows with β = 6 and N
= 11 (solid), 21 (semi-dashed), and 41 (dashed).

53
Appendix B: filter design by using FDATool

FDATool has been used to entry the design specification Start the FDATool by

entering the fdatool command in the MATLAB Command Window. MATLAB

displays the Filter Design & Analysis Tool dialog.

In the Filter Design & Analysis Tool dialog, check that the following filter options

are set:

Option Value

Response Type Lowpass

Design Method FIR Equiripple

Filter Order Specify order: 5

Options Density Factor: 20

Frequency Specifications Units: KHz


Fs: 10
Fpass: 2
Fstop: 3

54
Magnitude Specifications Units: dB
Apass: 1
Astop: 80

Click the Set Quantization Parameters button in the left-side tool bar. The FDATool

displays a Filter arithmetic menu in the bottom half of its dialog.

Select Fixed-point from the Filter arithmetic list. The FDATool displays the first of

three tabbed panels of quantization parameters across the bottom half of its dialog.

You use the quantization options to test the effects of various settings with a goal of

optimizing the quantized filter's performance and accuracy.

Tab Parameter Setting

Coefficients Numerator word length 16

Best-precision fraction Selected

55
lengths

Use unsigned Cleared


representation

Scale the numerator Cleared


coefficients to fully utilize

the entire dynamic range

Input/Output Input word length 16

Input fraction length 15

Output word length 16

Avoid overflow Selected

Filter Internals Round towards Floor

Overflow mode Saturate

Product mode Full precision

Accum. mode Keep MSB

Accum. word length 40

Cast signals before Selected

accum

Click Apply

Start the Filter Design HDL Coder by selecting Targets–>Generate HDL in the

FDATool dialog. The FDATool displays the Generate HDL dialog.

56
Find the Filter Design HDL Coder online help. Use the help to learn about

product details or to get answers to questions as you work with the designer.

a. In the MATLAB window, click the Help button in the toolbar or click

Help–>Full Product Family Help.

b. In the Help browser's Contents pane, select Filter Design HDL

Coder.

c. Minimize the Help browser.

Click the Help button. The FDATool displays context-sensitive help for the

dialog. As necessary, use the Help button on the other Filter Design HDL Coder

dialogs for context-sensitive help on those dialog views.

Close the Help window.

Place your cursor over the Name label or text box in the HDL filter pane of the

Generate HDL dialog and right-click. A What's This? button appears.

57
Click What's This? The Filter Design HDL Coder opens context-sensitive help

that describes the Name option. Use the context-sensitive help as needed while

using the GUI to configure options that control the contents and style of the

generated HDL code and test bench. A help topic is available for each option and

pane.

In the Name text box of the HDL filter pane, replace the default name with

basicfir. This option names the VHDL entity and the file that is to contain the

filter's VHDL code.

In the Name text box of the Test bench types pane, replace the default name

with basicfir_tb. This option names the generated test bench file.

Click HDL Options. The Filter Design HDL Coder displays an HDL Options

dialog.

In the Comment in header text box, type Tutorial - Basic FIR Filter and then

click Apply. The Filter Design HDL Coder adds the comment to the end of the

header comment block in each generated file.

58
Select the Ports tab. The Ports pane appears.

Change the names of the input and output ports. Replace filter_in with data_in

and filter_out with data_out.

Clear the check box for the Add input register option. The Ports pane should

now look like the following.

Click Apply and then OK to register your changes and close the HDL Options

dialog.

Click Test Bench Options. The Filter Design HDL Coder displays a Test Bench

Options dialog.

59
You use this dialog to customize the generated test bench.

For this tutorial, apply the default settings by clicking OK.

In the Generate HDL dialog, click Generate to start the code generation process.

The Filter Design HDL Coder displays the following messages in the MATLAB

Command Window as it generates the filter and test bench VHDL files.

### Starting VHDL code generation process for filter:


basicfir
### Generating basicfir.vhd file in: hdlsrc
### Starting generation of basicfir VHDL entity
### Starting generation of basicfir VHDL architecture
### HDL latency is 1 samples
### Successful completion of VHDL code generation process
for filter: basicfir
### Starting generation of VHDL Test Bench
### Generating input stimulus
### Done generating input stimulus; length 3429 samples.
### Generating VHDL file basicfir_tb.vhd in: hdlsrc
### Please wait ........
### Done generating VHDL test bench.
As the messages indicate, the Filter Design HDL Coder creates the directory hdlsrc

under your current working directory and places the files basicfir.vhd and

basicfir_tb.vhd in that directory.

The generated VHDL code has the following characteristics:

• VHDL entity named basicfir.

60
• Registers that use asynchronous resets when the reset signal is active

high (1).

• Ports have the following names:

VHDL Port Name

Input data_in

Output data_out

Clock input clk

Clock enable input clk_enable

Reset input
reset

• An extra register for handling filter output.

• Clock input, clock enable input and reset ports are of type

STD_LOGIC and data input and output ports are of type

STD_LOGIC_VECTOR.

• Coefficients are named coeffn, where n is the coefficient number,

starting with 1.

• Type safe representation is used when zeros are concatenated: '0' &

'0'...

• Registers are generated with the statement ELSIF clk'event AND

clk='1' THEN rather than with the rising_edge function.

• The postfix string _process is appended to process names.

The generated test bench:

• Is a portable VHDL file.

• Forces clock, clock enable, and reset input signals.

61
• Forces the clock enable input signal to active high.

• Drives the clock input signal high (1) for 5 nanoseconds and low (0)

for 5 nanoseconds.

• Forces the reset signal for two cycles plus a hold time of 2

nanoseconds.

• Applies a hold time of 2 nanoseconds to data input signals.

• Applies impulse, step, ramp, chirp, and white noise stimulus types.

When you have finished generating code, click Close to close the Generate HDL

dialog.

Low-pass FIR filter has been implemented with

62
Appendix B: Circuit diagram

Figure8-4 Circuit diagram of the overall system

63
Appendix C: Code

16 times baud rate clock

library ieee;
use ieee.std_logic_1164.all;

entity count54 is

port(clk50Mx,en:in std_logic;
clk16x:out std_logic);
end count54;

architecture count54_arc of count54 is


begin

process(clk50Mx,en)
variable count:integer range 0 to 54 :=0;
begin
if en='0' then
clk16x <= '0';
elsif (rising_edge(clk50Mx)) then
count:=count+1;
if count=54 then
clk16x<='1';
count:=0;
else
clk16x<='0';
end if;
end if;
end process;
end count54_arc;

Direct form FIR filter


LIBRARY IEEE;
USE IEEE.std_logic_1164.all;
USE IEEE.numeric_std.ALL;

ENTITY filter IS
PORT( clk : IN std_logic;
clk_enable : IN std_logic;
reset : IN std_logic;
data_in : IN std_logic_vector(7 DOWNTO 0); -- sfix8_En7
data_out : OUT std_logic_vector(7 DOWNTO 0) --
sfix8_En7
);

END filter;

64
----------------------------------------------------------------
--Module Architecture: filter
----------------------------------------------------------------
ARCHITECTURE rtl OF filter IS
-- Local Functions
-- Type Definitions
TYPE delay_pipeline_type IS ARRAY (NATURAL range <>) OF signed(7
DOWNTO 0); -- sfix8_En7
-- Constants
CONSTANT coeff1 : signed(15 DOWNTO 0) := to_signed(-4423,
16); -- sfix16_En16
CONSTANT coeff2 : signed(15 DOWNTO 0) := to_signed(18623,
16); -- sfix16_En16
CONSTANT coeff3 : signed(15 DOWNTO 0) := to_signed(29114,
16); -- sfix16_En16
CONSTANT coeff4 : signed(15 DOWNTO 0) := to_signed(29114,
16); -- sfix16_En16
CONSTANT coeff5 : signed(15 DOWNTO 0) := to_signed(18623,
16); -- sfix16_En16
CONSTANT coeff6 : signed(15 DOWNTO 0) := to_signed(-4423,
16); -- sfix16_En16

-- Signals
SIGNAL delay_pipeline : delay_pipeline_type(0 TO 4); -- sfix8_En7
SIGNAL data_in_regtype : signed(7 DOWNTO 0); -- sfix8_En7
SIGNAL product6 : signed(31 DOWNTO 0); -- sfix32_En31
SIGNAL mul_temp : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL product5 : signed(31 DOWNTO 0); -- sfix32_En31
SIGNAL mul_temp_1 : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL product4 : signed(31 DOWNTO 0); -- sfix32_En31
SIGNAL mul_temp_2 : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL product3 : signed(31 DOWNTO 0); -- sfix32_En31
SIGNAL mul_temp_3 : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL product2 : signed(31 DOWNTO 0); -- sfix32_En31
SIGNAL mul_temp_4 : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL product1 : signed(34 DOWNTO 0); -- sfix35_En31
SIGNAL mul_temp_5 : signed(23 DOWNTO 0); -- sfix24_En23
SIGNAL sum1 : signed(34 DOWNTO 0); -- sfix35_En31
SIGNAL add_temp : signed(35 DOWNTO 0); -- sfix36_En31
SIGNAL sum2 : signed(34 DOWNTO 0); -- sfix35_En31
SIGNAL add_temp_1 : signed(35 DOWNTO 0); -- sfix36_En31
SIGNAL sum3 : signed(34 DOWNTO 0); -- sfix35_En31
SIGNAL add_temp_2 : signed(35 DOWNTO 0); -- sfix36_En31
SIGNAL sum4 : signed(34 DOWNTO 0); -- sfix35_En31
SIGNAL add_temp_3 : signed(35 DOWNTO 0); -- sfix36_En31
SIGNAL sum5 : signed(34 DOWNTO 0); -- sfix35_En31
SIGNAL add_temp_4 : signed(35 DOWNTO 0); -- sfix36_En31
SIGNAL output_typeconvert : signed(7 DOWNTO 0); -- sfix8_En7

65
SIGNAL output_register : signed(7 DOWNTO 0); -- sfix8_En7

BEGIN

-- Block Statements
Delay_Pipeline_process : PROCESS (clk, reset)
BEGIN
IF reset = '1' THEN
delay_pipeline(0 TO 4) <= (OTHERS => (OTHERS => '0'));
ELSIF clk'event AND clk = '1' THEN
IF clk_enable = '1' THEN
delay_pipeline(0) <= signed(data_in);
delay_pipeline(1 TO 4) <= delay_pipeline(0 TO 3);
END IF;
END IF;
END PROCESS Delay_Pipeline_process;

data_in_regtype <= signed(data_in);

mul_temp <= delay_pipeline(4) * coeff6;


product6 <= resize( mul_temp & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32);

mul_temp_1 <= delay_pipeline(3) * coeff5;


product5 <= resize( mul_temp_1 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32);

mul_temp_2 <= delay_pipeline(2) * coeff4;


product4 <= resize( mul_temp_2 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32);

mul_temp_3 <= delay_pipeline(1) * coeff3;


product3 <= resize( mul_temp_3 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32);

mul_temp_4 <= delay_pipeline(0) * coeff2;


product2 <= resize( mul_temp_4 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32);

mul_temp_5 <= data_in_regtype * coeff1;


product1 <= resize( mul_temp_5 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 35);

add_temp <= resize(product1, 36) + resize(product2, 36);


sum1 <= (34 => '0', OTHERS => '1') WHEN add_temp(35) = '0' AND
add_temp(34) /= '0'
ELSE (34 => '1', OTHERS => '0') WHEN add_temp(35) = '1' AND add_temp(34)
/= '1'
ELSE (add_temp(34 DOWNTO 0));

add_temp_1 <= resize(sum1, 36) + resize(product3, 36);


sum2 <= (34 => '0', OTHERS => '1') WHEN add_temp_1(35) = '0' AND
add_temp_1(34) /= '0'
ELSE (34 => '1', OTHERS => '0') WHEN add_temp_1(35) = '1' AND
add_temp_1(34) /= '1'

66
ELSE (add_temp_1(34 DOWNTO 0));

add_temp_2 <= resize(sum2, 36) + resize(product4, 36);


sum3 <= (34 => '0', OTHERS => '1') WHEN add_temp_2(35) = '0' AND
add_temp_2(34) /= '0'
ELSE (34 => '1', OTHERS => '0') WHEN add_temp_2(35) = '1' AND
add_temp_2(34) /= '1'
ELSE (add_temp_2(34 DOWNTO 0));

add_temp_3 <= resize(sum3, 36) + resize(product5, 36);


sum4 <= (34 => '0', OTHERS => '1') WHEN add_temp_3(35) = '0' AND
add_temp_3(34) /= '0'
ELSE (34 => '1', OTHERS => '0') WHEN add_temp_3(35) = '1' AND
add_temp_3(34) /= '1'
ELSE (add_temp_3(34 DOWNTO 0));

add_temp_4 <= resize(sum4, 36) + resize(product6, 36);


sum5 <= (34 => '0', OTHERS => '1') WHEN add_temp_4(35) = '0' AND
add_temp_4(34) /= '0'
ELSE (34 => '1', OTHERS => '0') WHEN add_temp_4(35) = '1' AND
add_temp_4(34) /= '1'
ELSE (add_temp_4(34 DOWNTO 0));

output_typeconvert <= (7 => '0', OTHERS => '1') WHEN sum5(34) = '0' AND
sum5(33 DOWNTO 31) /= "000"
ELSE (7 => '1', OTHERS => '0') WHEN sum5(34) = '1' AND sum5(33
DOWNTO 31) /= "111"
ELSE (sum5(31 DOWNTO 24));

Output_Register_process : PROCESS (clk, reset)


BEGIN
IF reset = '1' THEN
output_register <= (OTHERS => '0');
ELSIF clk'event AND clk = '1' THEN
IF clk_enable = '1' THEN
output_register <= output_typeconvert;
END IF;
END IF;
END PROCESS Output_Register_process;

-- Assignment Statements
data_out <= std_logic_vector(output_register);

END rtl;

UART_Receiver

-----------------------------------------------------

library ieee ;

67
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

-----------------------------------------------------

entity rx is
port( Din: in std_logic;
clock: in std_logic;
reset: in std_logic;
Dout: out std_logic_vector(7 downto 0);
Drdy: out std_logic
);
end rx;

-----------------------------------------------------

architecture FSM of rx is

-- define the states of FSM model

type state_type is (S0, S1);


signal next_state, current_state: state_type;
signal tmpdata:std_logic_vector(9 downto 0);
signal datardy:std_logic;
signal count: std_logic_vector(7 downto 0);
signal start: std_logic;
begin

-- cocurrent process#1: state registers


process(clock, reset)
begin
if (reset='1') then
current_state <= S0;
elsif (clock'event and clock='1') then
current_state <= next_state;
end if;

end process;

process(reset,Din,current_state)
begin
if (reset='1') or current_state=S0 then
start <= '0';
elsif current_state = S0 and (Din'event and Din='0') then
start <= '1';
end if;
end process;

process(clock, reset, current_state)

68
begin
if reset='1'or current_state = S0 then
count<="00000000";
elsif(clock'event and clock='1') then
count <= count + 1;
end if;
end process;

process(clock, count, Din,current_state)


begin
next_state<= current_state;
--if next_state<=current_state then
case current_state is

when S0 =>
if start <= '1' then
next_state <= S1;
datardy<='0';
else
next_state <= S0;
end if;
when S1=>
if count=x"18"then
next_state <= S1;
tmpdata(0)<=Din;
datardy<='0';

elsif count=x"28" then


next_state <= S1;
tmpdata(1)<=Din;
datardy<='0';

elsif count=x"38"then
next_state <= S1;
tmpdata(2)<=Din;
datardy<='0';

elsif count=x"48"then
next_state <= S1;
tmpdata(3)<=Din;
datardy<='0';

elsif count=x"58"then
next_state<= S1;
tmpdata(4)<=Din;
datardy<='0';

elsif count=x"68"then
next_state<= S1;

69
tmpdata(5)<=Din;
datardy<='0';

elsif count=x"78"then
next_state<= S1;
tmpdata(6)<=Din;
datardy<='0';

elsif count=x"88"then
next_state<= S1;
tmpdata(7)<=Din;
datardy<='0';
dout <= tmpdata(7 downto 0);
elsif count=x"95"then
next_state<= S1;
datardy<='1';

elsif count=x"98"then
next_state<= S1;
datardy<='1';

elsif count=x"9D"then
next_state<= S0;
datardy<='0';
else
next_state<= S1;
end if;
when others =>
next_state<= S0;
end case;
-- end if;
end process;
Drdy <= datardy;

end FSM;

UART_Transmitter
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

entity Tx is
port ( clk16 : in std_logic; -- Input clock with 50 MHz clock signal
wr : in std_logic; -- Write strobe signal
rst : in std_logic; -- Reset button
din : in std_logic_vector(7 downto 0); -- 8 Bits data bus
Input
sdo : out std_logic; -- Serial transmit data line
tx_rd : out std_logic);-- Transmitter ready to Tx next byte

70
end Tx;

architecture Tx_arc of Tx is

signal txbr : std_logic_vector(7 downto 0);-- 8 Bits buffer register


signal txsr : std_logic_vector(7 downto 0);-- 8 Bits serial register
signal txd2 : std_logic; -- tag bits for detecting
signal txd1 : std_logic; -- empty shift register
signal clk : std_logic; -- Transmit clock or baud rate clock
signal txdone : std_logic; -- '1' when shifting of byte is done
signal txrd : std_logic; -- '1' when data is ready in txhold

begin

process (rst, clk16) -- Toggle clk every 8 counts, which divides the clock
frequency by 16
variable count: std_logic_vector(2 downto 0);
begin
if rst='1' then
clk <= '0' ;
count := (others=>'0') ;
elsif clk16'event and clk16='1' then
if (count = "000") then
clk <= not clk;
end if;
count := count + "001";
end if;
end process;

process (rst, clk)


begin
if rst='1' then
txsr <= (others=>'0') ;
txd1 <= '0' ;
txd2 <= '0' ;
sdo <= '0' ;
elsif clk'event and clk = '1' then
if (txdone and txrd) = '1' then -- Initialize registers and load next byte of
data
txsr <= txbr; -- Load tx register from txbr
txd2 <= '1'; -- Tag bits for detecting
txd1 <= '1'; -- when shifting is done
sdo <= '0'; -- Start bit to the serial port
else
txsr <= txd1&txsr(7 downto 1); -- Shift data
txd1 <= txd2;
txd2 <= '0';
if txdone = '1' then -- Shift out data

71
sdo <= '1'; -- Stop or idle bit
else
sdo <= txsr(0); -- Shift data bit
end if;
end if ;
end if;
end process;
txdone <= not(txd2 or txd1 or txsr(7) or txsr(6) or txsr(5) or txsr(4) or
txsr(3) or txsr(2) or txsr(1) or txsr(0)); -- txdone = 1 when done
shifting (When txtag2 has reached tx)

process (rst, clk16)


variable wr1,wr2: std_logic; -- write signal delayed 1 and 2 cycles or current
state and preivous state
variable txdone1: std_logic; -- txdone signal delayed one cycle
begin
if rst='1' THEN
txrd<= '0' ;
wr1 := '0' ;
wr2 := '0' ;
txdone1 := '0' ;
elsif clk16'event and clk16 = '1' then
if wr1 = '0' and wr2= '1' then -- Falling edge on write signal. New
data in txbr register
txrd <= '1';
elsif txdone ='0' and txdone1 = '1' then -- Falling edge on txdone signal.
Txbr has been read.
txrd <= '0';
end if;
wr2 := wr1; --
Delayed versions of write and txdone signals for edge detection
wr1 := wr ;
txdone1 := txdone;
end if ;
end process;

txbr <= din when wr = '1'else txbr; -- Latch data bus during write
tx_rd <= not txrd; -- Transmitter ready for write when no data
is in txbr
end Tx_arc;

72

You might also like