You are on page 1of 69

Högskolan i Skövde

Department of Computer Science

Constant False Alarm Rate Detection


of Radar Signals with
Artificial Neural Networks

Mirko Kück
mirko@ida.his.se

Final
6 October, 1996
Submitted by Mirko Kück to the University of Skövde as a dissertation towards the
degree of M.Sc. by examination and dissertation in the Department of Computer
Science.

October, 96

I certify that all material in this dissertation which is not my own work has been identified
and that no material is included for which a degree has already been conferred upon me.

Signed
Abstract
Radar signal detection is mostly based in conventional radar systems on statistical
methods. These methods are computationally expensive and often only optimal for one
type of clutter distribution. More recent approaches in radar signal detection consider
also artificial neural networks as detectors. These approaches have up to now only
treated clutter environments with homogenous distributed clutter. With this project
mixed clutter environments were addressed. The experiments with different artificial
neural network architectures showed that only a multilayer perceptron architecture could
solve the task of radar signal detection. This architecture achieved a good performance in
mixed clutter environment compared to a conventional detector.
Table of Contents

1. INTRODUCTION 1

1.1 Radar Signal Detection with ANNs 1

1.2 MSc Project 2

1.3 Outline 2

2. RADAR SIGNAL DETECTION 3

2.1 Conventional CFAR detection 6

3. ARTIFICIAL NEURAL NETWORKS 9

3.1 Background 9

3.2 Backpropagation Training Strategy 11


3.2.1 Learning rate 12
3.2.2 Momentum 13
3.2.3 Generalisation Capabilities and Overtraining 13
3.2.4 Hyperplanes 14

3.3 Architectures 15
3.3.1 Multilayer Perceptron 15
3.3.2 Simple Recurrent Network 16
3.3.3 Modular Approaches 17

4. PREVIOUS WORK 21

4.1 ANN Approaches 22

5. ANN-CFAR DETECTION 24

5.1 Two Layer ANN 24


5.1.1 CA-CFAR detector 24
5.1.2 ANN-CFAR detector 25

5.2 Multilayer ANN 26

6. EXPERIMENTS 28

6.1 Weibull PDF 28

6.2 SCR, PFA and PD 30

6.3 Threshold Calculation 32

6.4 Performance Criteria (CFAR-loss) 32

6.5 Training Set 33


6.5.1 Background Clutter Distribution 34
6.5.2 Target Signal Scenarios 34

6.6 Training the MLP 35

6.7 Approaches 36
6.7.1 Multilayer Perceptron Approach 37
6.7.2 Simple Recurrent Network Approach 47
6.7.3 CN Approach 49

7. EVALUATION 51

7.1 Specialised hidden units compared with direct training 51

7.2 Limited learning abilities of SRNs 52

7.3 Comparison between MLP and CN architecture 52

7.4 Comparison between ANN-CFAR and CA-CFAR 53

8. CONCLUSION 55

8.1 Approaches 55

8.2 Achievements 56

8.3 Future work 56

9. REFERENCES 58
Preface
All terms written in italic letters are defined in the appendix “Definitions”. The final
version of this document will be available on the world wide web
(http://www.ida.his.se/~mirko/thesis) and will contain hyperlinks on references.
1. Introduction
The MSc project documented in this thesis is carried out in co-operation with the
Connectionist Research Group at the University of Skövde and Ericsson Microwave
Systems AB.

1.1 Radar Signal Detection with ANNs

Electronic signal filtering is a complex task, which requires complicated specialised


hardware or computationally expensive signal processing. In radar systems the target
signals have to be separated from unwanted background clutter. This kind of filtering can
be considered as a pattern recognition problem, which fits very well to the possibilities an
artificial neural network (ANN) offers.

In the past, there have been several approaches to detect radar signals in unwanted
background clutter. These approaches used extensive statistical computations, which
where often only applicable to one kind of background clutter (homogenous clutter);
Farina et al. (1986), Barton (1988). In the last few years, researchers also picked up the
possibility of problem solving with artificial neural networks and used them for radar
signal detection; Bucciarelli et al. (1993), Amoozegar et al. (1994), Ramamurti et al.
(1993). These approaches delivered better and more robust results than conventional
statistical approaches.

October, 96 1
Introduction

1.2 MSc Project

The goal of this MSc Project is to train ANNs to perform constant false alarm rate
(CFAR) detection of radar signals in mixed clutter environments. The MSc Project will
investigate if and how ANNs can be trained to cope with mixed clutter environments and
which architectures are suitable for this kind of problem. The performance of ANN
detectors will be compared with conventional approaches and as a result finally
suggestions for future research be given.

1.3 Outline

Chapter 2 introduces the problem of signal detection in radar systems and a conventional
method for detection. Chapter 3 explains function and architectures of ANNs.
Afterwards, Chapter 4 considers previous approaches of ANNs in radar signal detection.
Chapter 5 then explains the relationship between conventional methods and a simple
ANN approach for radar signal detection. In Chapter 6 experiments are done to achieve
radar signal detection in mixed clutter environments with ANNs. This chapter also
describes the underlying distribution of the data set taken for the experiments. The
approaches taken in the experiments are evaluated in Chapter 7. In Chapter 8 the project
is concluded by bringing up the results from previous work in the context with the
achievements from this project. Further suggestions for future research are also given.

October, 96 2
2. Radar signal detection
A stream of radar pulses is sent out toward a target. As they are reflected by the target,
energy is returned from the target to the radar detector. The return signal is plotted as a
function of time delay which is proportional to the range (i.e. the target’s distance from
the radar). This continuous time delay function is transformed into discrete time steps,
where each time step corresponds to a discrete range. The signal amplitude for each
range is stored in separate range cells which can be viewed in a range profile. An
example is shown in Figure 1.
8

target
5 signals
amplitude
4

0
0 10 20 30 40 50 60 70 80 90 100

range

Figure 1, example range profile

Unfortunately, the radar system receives not only echoes from targets, it also receives
noise. This noise consist of thermal noise and background clutter. Background clutter
arises from reflections from building, hills, mountains, forest, waves on water etc.

October, 96 3
Radar signal detection

mountains

water
level

target
buildings

Figure 2, non-homogeneous environment

Furthermore, this background clutter may vary in different ranges but also vary in time
due to changes in the environment.

Small targets at a great distance produce only weak echoes, so weak that they can be
completely lost in the clutter. As the target’s range decreases, the strength of the
received target signal increases. If the distance gets closer, the target signal becomes
strong enough to be detected above the clutter Stimson (1983).

There are different approaches to develop radar signal detectors that are optimum in
some sense, such that false alarms (false detection’s) can be kept at a suitably constant
low rate in an a-priori unknown, time varying and spatially non-homogeneous
environment as shown in Figure 2. These approaches are called constant false alarm rate
(CFAR) detection. In this approaches the environment is modelled by distribution
functions. The fact that the environment is time varying restricts modelling due to
unknown environment transitions that are a-prior unknown. To simplify the detectors
complexity, detectors are in normal designed for an approximation of one known
environments.

A simple way to separate target signal returns from background clutter is to build a
threshold.

October, 96 4
Radar signal detection

10

9
threshold
8 target
signals
7

amplitude
5

0
0 10 20 30 40 50 60 70 80 90 100

range

Figure 3, threshold

The distribution of the clutter can be approximated by certain probability distribution


functions, where each medium follows a different probability distribution. For example,
clutter returns from buildings follow a different probability distribution than clutter
returns from mountains.

Not only does the clutter distribution vary, but also the clutter level (scale) strongly
varies. Detectors that cope with these variations have to have an adaptive threshold. The
theory of threshold detectors that are optimal for one type of clutter distribution
(homogenous clutter) is well understood if thermal noise and clutter are complex
Gaussian distributions, Farina (1987).

Unfortunately, this requires a-priori information about the statistic distributions which in
practice are not known completely a-priori.

October, 96 5
Radar signal detection

2.1 Conventional CFAR detection

If the target reflections get strong enough to be detected over the clutter, one can use a
fixed threshold to separate them from the background clutter. Unfortunately, this leads
to a huge amount of false alarms, if the clutter level changes and the clutter appears with
a higher amplitude, as illustrated in Figure 4.
20

18 signal
fixed threshold
16

false alarms + detection’s


14

12

amplitude target signal


10

e.g. water/ground
4

0
0 50 100 150 200 250 300 350 400 450

range

Figure 4, Fixed Threshold

In order to achieve CFAR detection in such a scenario, one needs a variable threshold
which is adaptive to different clutter levels. One kind of adaptive threshold detector is
the cell averaging CFAR (CA-CFAR) detector, described by Barton (1988).

October, 96 6
Radar signal detection

Comparator
Detection
0/1

1
N
∑x n

Figure 5, CA-CFAR detector

The input is supplied to a shift register, where the cell under test (CUT) is surrounded by
a set of neighbouring range cells. The CA-CFAR detector calculates the average of the
surrounding range cells and multiplies it with a fixed value α to get the threshold. The
threshold is in the next step compared with the value of the CUT. If this value exceeds
the threshold, a target detection will be given. Than the contents of the shift register are
shifted about one position and the detection process is repeated for the next range cell.
The shift register can be considered as a reference window which slides over the stream
of range cells.

A problem with the CA-CFAR is the possibility of another interfering target in the
reference window. The interfering target as depicted in Figure 6 gets included in the
averaging and increases the threshold so that the target in the CUT may not be detected.

Another problem with the CA-CFAR is that a clutter as edge shown in Figure 7 may
produce false alarms due to “cutting” the clutter edge with the threshold.

October, 96 7
Radar signal detection

20

18
signal
threshold
16 interfering
target
14

12

amplitude
10

target in cell
8 under test

6
reference
4 window

0
0 20 40 60 80 100 120 140 160 180 200
range

Figure 6, Target masking

25

signal
threshold
20

15

amplitude

10

reference
5 window

0
200 210 220 230 240 250 260 270 280 290 300

range

Figure 7, Clutter edge

October, 96 8
3. Artificial Neural Networks
Most of the current generation of artificial neural networks are characterised by a
multilayer architecture, with several layers of neurons. The backpropagation algorithm
from Rumelhart et al. (1986) is a well-known and largely successful learning algorithm
for supervised learning networks where input and domain are known a prior. Opposed to
supervised learning, an unsupervised learning algorithm blindly or heuristically classifies
the input domain and has often less computational complexity then supervised learning.

Multilayer architectures have proven been functionally capable of solving impressive


problems in information storage, processing, and transformation. They can be used for
density estimation, classification and regression problems as shown by Jordan (1996).
They may not, however, be the best neural techniques for a given problem. In spite of
these increased capabilities, they remain restricted to certain classes of problems, since
they are difficult to train and to scale.

3.1 Background

Biological neurons transmit electrochemical signals over neural pathways. Each neuron
receives signals from other neurons through special junctions called synapses. Some
inputs tend to excite the neurons, others tend to inhibit it. When the cumulative effect
exceeds a threshold, the neuron fires (becomes active) and sends a signal to other
neurons.

October, 96 9
Artificial Neural Networks

Artificial neural networks model this behaviour in a simple way. Each neuron has a set of
inputs and one output. The incoming signals (i1,…,in) are multiplied independently by a
weight (w1,…,wn). The sum of all weighted input signals determines the activation of the
neuron (net). To produce the output signal the activation is further processed by an
activation function (F) which can for example be a threshold, linear or sigmoid function.
i1

w1

n
i2 w2 net = ∑ii ⋅ wi F
i=1

wn

in

Figure 8, computational model of a neuron

The whole network consists of sets (layers) of neurons, which are connected. A network
consists of one input and one output layer, in addition the network can also contain one
or several hidden layers. The input layer buffers the inputs and has no weights. All
following layers (hidden and output layers) have weights in their inputs. Talking about a
network with a single layer of weights this network has two layers, one input and one
output layer.

If the information flow through the network is only in one direction, without feedback
loops, we call it a feedforward network. A neural network with feedback connections is
called recurrent.

October, 96 10
Artificial Neural Networks

Output Layer feedback


connection

Hidden Layer

Input Layer

Figure 9, feedforward and recurrent neural network

Solving a problem with an ANN generally involves the following steps:

• Select a network topology which fits to the nature of the problem

• Choose the activation functions which are appropriate for the nature of the problem

• Train the neural network with a training procedure

3.2 Backpropagation Training Strategy

Neural networks using supervised learning techniques are based on mapping input
patterns to output patterns. In order to recognise patterns, the network needs to be
trained with these patterns. To train the network one has to use a training algorithm. One
well known and widely used among current types of neural networks is the
backpropagation training algorithm, Rumeldart et al. (1986). The name backpropagation
comes from the fact that the error gradients of hidden units are derived from propagating
backward the errors associated with output units since only target values for output units
are given but not for hidden units. The errors are computed to be some function on the
distance between actual output and target output.

The output of a neuron is calculated as follows:

 n 
o j = F  ∑ w ij oi  with oi = previous layer output (unit input); oj = current layer output
 i 

October, 96 11
Artificial Neural Networks

where F is the activation function, for example a sigmoid function given by

1
F( x ) =
1 + e− x

Backpropagation learning works as follows: First, the output vector as a result of an


given input vector is calculated. Then the output vector is compared with the target
vector for the actual input. The error between target and output vector is than calculated
backwards through the net and results in an adjustment ∆wij for each weight. The
adjusted weight is

wij(t+1) = wij(t) + ∆wij

which should bring the output closer to the desired output (target vector). The
adjustment is calculated by the generalised delta rule for multilayer networks, see
Rumelhart et al. (1986). The weight adjustment is proportional to the error and leads to
no adjustments, if the input is zero.

One limitation is that the backpropagation algorithm is prone to local minima, because of
using a gradient descent strategy in order to minimise the error criterion. Gradient
descent corresponds to performing steepest descent on a surface in weight space whose
height at any point is equal to the error measure. If a point is reached where the
gradient gets zero, a minima is found and no further weight changes are done. This
minima can either be the global minima over the whole weight space or a local minima in
one region of the weight space. If a local minima is reached the algorithm is stuck and
has no ability to reach the global minimum or the error criterion respectively.

3.2.1 Learning rate

A careful selection of the step size for weight adjustment (the learning rate η) is often
necessary to ensure smooth convergence. Large step sizes can cause the network to

October, 96 12
Artificial Neural Networks

become paralysed. When network paralysis occurs, further training does little for
convergence. On the other hand, if the size is too small, convergence can be very slow.

3.2.2 Momentum

A large learning rate η corresponds to rapid learning but might result in oscillations. On
the other hand, some researchers, Rumelhart et al. (1986), have shown that a small η
may result in a failure in learning. Further, Rumelhart et al. (1986) proposed adding a
momentum term to the delta rule in the following form:

∆w ji (n + 1) = ηδ j oi + α∆w ji (n)

where δj is the error signal (target vector - output vector) and α, called momentum ratio,
is a constant and the parameters (n+1) and (n) are used to indicate the (n+1) and nth
step. The momentum term is used to specify that the change to the step (n+1) step
should be somewhat similar to the change in the nth step. This modified delta rule was
used in the experiments described later.

3.2.3 Generalisation Capabilities and Overtraining

Generalisation is concerned with how well the network performs on the problem with
respect to both seen and unseen data.

During the training, the process of adjusting the weights based on the gradients is
repeated until the error criterion is reached or a continuing training would not reduce the
error any further. Overtraining the network will make it memorise the individual input-
output training pairs (training set) rather than adapting to common features within the
data. This issue arises, because the network matches the data too closely and loses its
generalisation ability over unseen data (test data). In practice, one has to choose a
stopping condition, typically an error goal.

October, 96 13
Artificial Neural Networks

3.2.4 Hyperplanes

A hyperplane separates the input space in two decision regions. Each neuron of the same
layer can form one hyperplane. For example, with two neurons the input space (e.g.
Colour, Shape) can be separated into maximally four regions ( , , , ). In this cdef
example the attributes of the input space belong to either on of two fruit types.

Hyperplanes Fruit Type I


Fruit Type II

Colour c f
Colour Shape

d e
1 2

Shape

Figure 10, hyperplanes of two neurons

To combine together the attributes of region c and e that are of the same fruit type, a
following layer of units can than separate the output result (c,d,e,f) of the previous
layer further in decision regions (I, II) as shown in the next figure. This decision regions
are in this case the correct classification of attributes into fruit types.

Output
Neuron1
c f
II 1 2 Layer 1
I

d e 3 Layer 2
Output
Hyperplane
Neuron2

Figure 11, separation of previous layer output result

October, 96 14
Artificial Neural Networks

3.3 Architectures

Three different types of architectures will be considered:

• Multilayer Perceptron (MLP)

• Simple Recurrent Network (SRN)

• Modular Approaches

3.3.1 Multilayer Perceptron

As mentioned earlier artificial neural networks are based on mapping input patterns onto
output patterns. Using an MLP architecture, as illustrated in Figure 12, there can be any
non-linearity between the input and the output pattern, if the network has sufficient
hidden units and hidden layers.

computational
direction

Figure 12, MLP architecture


To represent temporal dependencies within the network, the MLP can learn the
transitional probabilities from one input to the next. On the other hand dependencies
between patterns that span more than two inputs cannot be learned. A different way to
represent time in MLPs is to present it as an additional dimension of the input. If one
wants to compute a temporal sequence with an MLP, this sequence must be presented as
overlapping time slices to the network. A relative temporal position between two
network input vectors (v1, v2) as considered in Elman’s (1990) example,

October, 96 15
Artificial Neural Networks

v1: 0011110000

v2: 0000111100

will be treated by the network as quite dissimilar input vectors. However MLPs can, of
course, be trained to treat this patterns as similar, but the similarity is a consequence of
the external teacher and not of the relative temporal position, which will not lead to
generalisation to novel patterns.

3.3.2 Simple Recurrent Network

The ability to process sequences is crucial for many tasks. It is needed whenever the
order of events is relevant or the current state depends on previous states. Speech
recognition or time series prediction are some examples. The objective is to enable
ANNs to handle temporal information between more than one state transition.

computational
direction

Figure 13, SRN architecture


Simple recurrent networks (SRNs) represent time implicitly in their structure and not as
an additional dimension of the input. To achieve this feature the network must be given
memory which is done in Elman’s (1990) simple recurrent network, see Figure 13, by
feeding back the hidden representation as extra input units to the hidden layer itself. This
extra input units are called context units, because they represent the context of the
previous steps and provide the network with memory.

SRNs are also applicable to other problems than temporal sequences. Other problems
can be decomposed into sequences, where the network learns for example structures of

October, 96 16
Artificial Neural Networks

letters in sequences. Even in cases where a complete prediction of the output sequence is
not possible the network can learn to predict partial sub-regularities as shown by Elman
(1990).

The advantage of feedback in a network is that the history of inputs effecting the output
is not strictly limited as in time window approaches. Another advantage is that they are
smaller than MLPs for time dependencies because the temporal dimension is implicit in
the structure and not as in MLPs represented in their spatial dimension.

3.3.3 Modular Approaches

Modularity is a technique largely used in computer science, by decomposing a problem


into sub problems. Each sub problems is solved by a separate module that is expect to be
simpler than the overall task. This can also be applied to ANNs, where each module is a
single ANN specialised on one part of the whole problem. By combining or switching
between the results of each module, a final result can be obtained. This combining or
switching mechanism can be implemented in different ways.

Modular Neural Networks

supervision

Module
1
supervision

Module Module output


input
2 4

supervision
Module
3

supervision

Figure 14, Modular Neural Network

October, 96 17
Artificial Neural Networks

The four modules shown in Figure 14 are trained simultaneously and supervision is
provided to each module. Modules one to three can be supported with all inputs or only
those inputs that each specialised module needs for its task. Module four receives the
outputs from the other modules and is trained to combine this outputs to a final result.
One example could be to use the advantages of specialised ANNs trained to estimate
single parameters of probability distribution function (e.g. variance, mean, median). A
fourth module could combine this parameters and give an estimation about the
underlying probability distribution function.

Adaptive Mixture of Experts

Expert Expert outputs


1 s1
s1,s2,s3

Expert s2 output
input
2
s = g1s1+g2s2+g3s3

Expert
3 s3

g1
Gate outputs
g2 sum up to 1
gating
g3 g1+g2+g3 = 1

Figure 15, Adaptive Mixture of Experts

This modular architecture was introduced by Jacob et al. (1991). It uses different
networks to learn training patterns from different regions of the input space. The
network consists of two types of modules as illustrated in Figure 15: Expert networks
and a gating network. The expert networks compete to learn the training patterns and the
gating network mediates the competition. The gating network has as many outputs as
there are expert networks, and the activation of these output units must be non negative

October, 96 18
Artificial Neural Networks

and add up to one. That means that the output of the gating network is the portion each
expert has of the final result (output of the network).

Cascaded Networks

input Context weights of


Network Function Network

input Function output


Network

Figure 16, Cascaded Network

Breaking a problem into sub problems leads to modules that have to be implemented in
separate ANNs. A cascaded network (CN) as introduced by Pollack (1987) offers a
possibility to ‘combine’ these modules within one network. The network consists of a
function network and a context network. Training the function network separate for each
sub problem leads to multiple sets of weights, one for each sub problem. The context
network computes as its outputs the weights of the function network. If each sub
problem can be indicated due to some input features the context network can then be
trained to map these states or patterns onto the specific weight set for each sub problem.
This mechanism leads to a network which can implement multiple functions with weights
that are modified dynamically during computation. Running all sub problems first could
make the mapping between input features and weight sets difficult due to edge
constraints between the sets. Merging the sub-problems removes these constraints and
forms the training as follows. All input patterns are passed through the function network
and after each pass of an input pattern, the output error is propagated backward through
the net to compute the weights of the function network. These weights are then used in
the same step as target patterns for the context network to compute its weights due to

October, 96 19
Artificial Neural Networks

backpropagation. As input of the context network any feature can be chosen (i.e. the
same input as to the function network, only some specific input units, the internal state of
the function network, etc.). Pollack (1987) build for example a 4-1 multiplexor using a
cascaded network. According to a two bit address, one of four data lines should be
switched to the output. Pollack used the two bit address as input of the function network
and the four data lines as input of the context network. Training the cascaded network
converges in general much quicker than a feed-forward solution (6 input, 4 hidden and 1
output unit) where there is no distinction between all six inputs.

October, 96 20
4. Previous Work
Radar signal detection is a complex task, where a lot of actions have to be done on
separate levels to achieve CFAR detection. A radar signal processing system looks in
general as shown in Figure 17, Farina (1986) and Bogler (1990).

Digital
Antenna Receiver Data Display
Signal
Processor
Processor

Figure 17, Radar Signal Processing

The antenna receives reflected radar signals, which are then amplified and filtered in the
Receiver. The received input stream is then stored in range cells and the digital signal
processor (DSP) performs target detection and false alarm control on it. Modern radars
possess very sophisticated feedback logic to optimise the detection performance. This is
done by correlating the new radar measurements with previous target tracks in the Data
Processor, to make sequential predictions about tracks.

There are a lot of research activities in radar signal processing. Most of these research
activities apply conventional statistical methods to detect signals within noisy
environments. Over the last decade, these research activities also picked up the new
possibilities of problem solving with ANNs. ANNs have been successfully used in radar
image processing, like automatic target recognition, Grossberg (1995).

This project focuses on the action taking place in the DSP, detecting single targets, to
achieve CFAR detection. Therefore ANN approaches used in the receiver or data
processor will only be mentioned but not further described in later chapters.

October, 96 21
Previous Work

4.1 ANN Approaches

• In most radar systems, a series of M short pulses is sent out in one direction. The
received radar pulse has a complex nature (e.g. signal amplitude and phase), where
the whole received signal consists of M complex pulses. Due to the complex nature of
the received signal, it is possible to calculate if the received signal is a target reflection
or clutter reflection for each range cell. To improve the detection rate
Bucciarelli et al. (1993) trained an ANN with complex inputs and activation functions
and compared their results with the performance of optimal detectors for gaussian
clutter. They focused in their study mainly on the comparisons of different signal to
clutter ratios with the result that the complex ANN receiver performs very closely to
the optimum receiver for gaussian clutter.

• Most CFAR detectors do not focus on the level of complex signals for single range
cells. They consider a set of range cells after pre-processing the complex return
signals as earlier described in section 2.1. A problem with conventional detectors is
that they have bad performance if the number of available reference cells decreases.
This may be caused by several factors, primarily from constraints on the radar system
used (e.g. resolution, presence of interfering targets, dead cells, clutter
discontinuities). The robustness against a loss of reference cells with an ANN
approach is considered by Amoozegar et al. (1994). They focused on a multilayer
feed-forward network with different reference window sizes and different signal to
clutter ratios (SCR) and compared their results with conventional detectors. As input
parameters they used the cells in the reference window, but also statistical input
features like the statistical mean over the reference window, the average power and
the variance. They pointed out that ANN detectors consistently provide a superior
performance compared to conventional detectors, when the size of reference window
is small. Further, they mentioned similar robust performance evaluations under other
conditions, viz. presence of interfering targets, dead cells (cells with no or corrupted

October, 96 22
Previous Work

value) and clutter discontinuities. Also Ramamurti et al. (1993) showed that their
approach with a multilayer neural network and only the reference window as an input
vector yields considerable performance improvements in non-gaussian noise over
locally optimum detectors.

• Considering the radar image as a whole is the last step in the radar signal process.
This approach belongs in the category of image processing where target tracking is
used to filter out targets and where ANNs have been successfully used according to
Grossberg (1995). In this literature review this area will not be considered any further,
since it would go beyond the scope of this project.

All former approaches tend in the direction of robustness in non-gaussian noise


environments. Further research is necessary considering radar detection problems in
homogenous clutter (e.g. different clutter levels, clutter edges and interfering targets)
and also in mixed clutter environments (e.g. different clutter distributions). Also an
investigation is necessary, that considers the internal behaviour of ANN detectors (e.g.
why and how the detection process works with ANNs).

This project should extend the research of Ramamurti (1993) and Amoozegar and
Sundareshan (1994) and investigate if artificial neural networks (ANNs) can be trained
on radar signal detection in mixed clutter environments and perform better than
conventional radar detectors.

October, 96 23
5. ANN-CFAR Detection
This chapter will first explain the relation between a single layer ANN and a CA-CFAR
detector. Then it will focus on MLPs and will moreover relate different architectures to
the problems of CFAR detection.

5.1 Two Layer ANN

First of all it will be shown why an ANN with n input-units and one output unit (single
layer of weights) is able to solve the same task of radar target signal detection as a cell
averaging (CA)-CFAR detector does. If this simple ANN is able to do the task of radar
signal detection, than a more complex multilayer architecture as discussed in the next
section must at least be able to do the same task.

5.1.1 CA-CFAR detector

A CA-CFAR detector has as input the amplitudes of the cell under test (CUT) and a set
of surrounding reference cells. The CA-CFAR calculates the average value of all
reference cells and multiplies it by α to build the threshold just above the highest peaks
of the background clutter distribution. This of course, can only be optimal for one kind
of distribution. The calculation of α will be described in Chapter 6.3 with respect to
probability distribution functions.

October, 96 24
ANN Approaches

N M +
> 0 : target signal (1)
≤ 0 : no target signal (0)
CUT -
N N +1+ M
1
∑x
1
N i
M
∑x i
i=1 i = N +1

1 1 N N +1+ M

⋅  ∑ xi + ∑ xi 
2 N  i=1 i = N +1  α

Figure 18, CFAR-Detector

5.1.2 ANN-CFAR detector

A simple ANN-CFAR detector is shown in Figure 19. It has one input layer and one
single output unit with a squashing function which converts values less than or equal to
zero to a zero output (no target present) and values greater than zero to an output of 1
(target present). Let us assume the weights on the connections from the reference cell
1
are manually initialised with - . If further the weight on the connection from the
n −1
CUT is initialised with 1, the ANN will produce an output of greater than zero, if the
CUT contains a value greater than the average value of all surrounding cells. To detect
only amplitudes from signals as target signals that are higher than the clutter level and
therefore achieve a low rate of false alarms, the weight on the connection from the CUT
has to have a smaller value than 1.

October, 96 25
ANN Approaches

N M
CUT

Figure 19, ANN-Detector

The activation state of the output unit (before applying the squashing function) for such
an ANN with a single layer of weights is given by:
n
1
∑w x
i =1
i i where w1 ,..., w N and wN +1 ,..., wN +1+ M =
n −1
and w5 = α

If one splits this equation up, one gets for the cell average:

N N +1+ M
1  N N +1+ M
 1 1 N N +1+ M

∑ wi xi +
i =1
∑ wi xi =
i = N +1
 ∑ x i + ∑ xi  = ⋅  ∑ x i + ∑ x i 
n − 1  i =1 i = N +1  2 N  i =1 i = N +1 
with N = M ,

which is the same result as of the cell average (CA)-CFAR detector and therefore shows
that both detectors theoretically should have the same performance.

5.2 Multilayer ANN

An ANN-CFAR detector such as considered above with one output unit and no hidden
layer has no advantage against any other conventional detector but it is useful to show
the similarities. The question, if ANNs can gain advantages against conventional
detectors by using more complex architectures will be discussed next.

October, 96 26
ANN Approaches

The fact that background clutter does not follow any distribution, and can only be
approximated by some distribution functions, leads to problems with conventional CFAR
detectors. The ability of ANNs to model input output relations without any knowledge
about the relations is called model freedom. The ANN can be considered as a black box
with some specific features dependent on the ANN architecture. Statistical approaches
need sufficient knowledge about the data to build models. These models often become
very complex. ANNs can be trained by example where the internal model is generated
due to training the network. During training the network adapts to features of the input
set. It is not clear what kind of features there are, but for example training an ANN on
averaging patterns leads to an minimisation of the output variance, because the sum
squared error (EP) of an output pattern is equivalent to the variance (σ2) of this pattern:

∑ (t )
1 2
EP = Pj − oPj with tPj = target pattern = average value and oPj = output pattern
N j

∑ (x − µ)
1 2
σ2 = i with µ = average value and xi = distribution value
N i

On the other hand this means that the ANN must internally model the variance of the
input distribution. Depending on the adaptation of statistical features, a multilayer ANN
detector can deliver better results for input distribution that cannot be modelled or only
be modelled by some complex distribution functions.

Conventional detectors are mostly optimal for only one kind of clutter distribution
function. In the next chapter experiments are described that consider the adaptation of
ANNs to more than one clutter distribution which would be a main advantage against
conventional detectors.

October, 96 27
6. Experiments
As shown by Ramamurti (1993) ANNs can be trained to detect radar signals in non-
Gaussian clutter. In his study Ramamurti trained ANNs separately on non-Gaussian
clutter and demonstrated that ANNs yield performance improvements over locally
optimum detectors. In the following experiments the performance of ANNs in mixed
clutter environments will be considered. Mixed clutter means that one ANN should be
trained on different non-Gaussian distributions and be able to adapt to all of them.

6.1 Weibull PDF

The Weibull probability density function (PDF) can be used as a good approximation of
various clutters. Its two parameters (a, b) specify the shape (a) and scale (b). Varying
the shape parameter leads to different non-Gaussian distributions. A special case with
a=2 is the Rayleigh distribution which is a good approximation for target signals, Farina
(1987). The Weibull PDF is defined by

a −1  x
a
a  x − 
 b
f (x) =   e .
b  b

October, 96 28
Experiments

1.5

a = 3.0
f(amplitude)
a = 2.0
1
a = 1.3

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4

amplitude

Figure 20, different Weibull PDFs

Figure 20 displays the PDF for value b = 1 and different values of a. The distribution (v)
has a mean value defined by

 1
E[v ] = b ⋅ Γ 1 + 
 a

and a mean square value defined by


 2
E[v 2 ] = b 2 ⋅ Γ 1 +  [1]
 a

where

∫e
− t x −1
Γ( x ) = t dt
t =0

is called gamma function.

The cumulative distribution function (CDF) returns the probability p(x) that a Weibull
distributed value falls in the interval [0 x].

October, 96 29
Experiments

a
x  x
− 
p( x ) = ∫ f ( x' )dx' = 1 − e
x '= 0
 b

Weibull distributed random variables can be generated from an, over the interval [0 1],
uniformly distributed random variable u, by transforming the CDF into

( )
1

x = b ⋅ − ln(1 − p( x )) a

1
x = b ⋅ (− ln( u)) a . [2]

6.2 SCR, PFA and PD

The ratio between clutter and signal level is called signal to clutter ratio (SCR) and can
be calculated as

average signal power E[s ]


2

SCR = = [3]
average clutter power E[n 2 ]

where the average power is equal to the mean square value of signal or clutter
distribution respectively.

The most important parameters of a CFAR-detector are the false alarm rate PFA and the
detection performance PD. The criterion for a good performance is to maximise the
detection performance PD for a low PFA at a certain SCR.

PFA and PD are defined as follows for a given threshold T, clutter distribution fCLUT(x) and
joint distribution of clutter+signal together fCUT(y) in the cell under test (CUT).
a
∞ T
− 
∫f
 b
PFA = P( X > T ) = CLUT ( x )dx = e [4]
x =T

PD = P(Y > T ) = ∫f
x =T
CUT ( y )dy .

October, 96 30
Experiments

Unfortunately the joint distribution function of signal+clutter in the CUT is unknown.


Only the distribution functions for clutter alone and signal alone are available for
calculations, therefore it is not possible to calculate PD directly for a given threshold. A
method to obtain PD iteratively is shown in Ravid and Levanon (1992).

For the case of Rayleigh target and Weibull clutter, the following relationship holds
2
T
− 
 b
 2
(1+ SCR )⋅Γ  1+ 
 a
PD = e

from which the triple relationship for a fixed-threshold (non-CFAR) detector between
SCR, PFA and PD can be obtained
2
  1  a
 ln  
  PFA  
SCR∞ = − 1.
 1  2
ln   ⋅ Γ 1 + 
 PD   a 

Due to the fact that analytical methods are not available for evaluating the performance
of ANN-CFAR detectors, PFa and PD have been calculated from simulation experiments.
The positions of targets are normally known during experiments. A comparisons between
known target positions and detected target positions leads to the number of false and
correct detection’s. PFA and PD are calculated by

false detections
PFa =
number of all tested clutter cells

correct detections
PD =
number of all targets

October, 96 31
Experiments

6.3 Threshold Calculation

Equation [4] given for the false alarm rate can be converted to the threshold T.

( )
1

T = b ⋅ − ln( PFA ) a

Using a given T, the fixed values α for the CA-CFAR detector can be obtained from

T
α= with x = mean value .
x

The following table lists the threshold and fixed value α for a scale parameter b=1 and
four different shape parameters of the Weibull distribution.

Weibull shape parameter threshold T mean x α

a = 2.0 2.6283 0.8862 2.9657

a = 1.6666 3.1886 0.8935 3.5686

a = 1.3333 4.2609 0.9191 4.6361

a = 1.0 6.9077 1.0 6.9077

Table 1
In practice the number of samples to calculate the mean value x is limited due to the
limited reference window size. This results in a variance in the mean values x with the
effect that α has to be chosen a bit higher than calculated.

6.4 Performance Criteria (CFAR-loss)

A detector has to estimate in some way the underlying clutter distribution to be adaptive
to it. In the case of Weibull distributed clutter this can be done in several ways, either by

October, 96 32
Experiments

considering the mean and variance of the distribution, or estimating the scale and shape
parameter.

This estimation is due to the limited size of the reference window reduced in the
accuracy which leads to a variance in the estimated value and therefore to a loss in
detector performance.

The CFAR loss is the ratio between the SCR of a CFAR detector and SCR∞ of a fixed
threshold (non-CFAR) detector where the parameters are known a priori and the
threshold is a constant that is optimal for one distribution, Ravid and Levanon (1992).
For a required PFA the CFAR detector has for the same SCR a lower PD as a non-CFAR
detector, because of the variance in the estimated threshold. This can also be viewed the
other way around, if both detectors have the same PD for a required PFA, the CFAR
detector needs a stronger signal amplitude to achieve PD which means that the SCR∞ has
a lower value than SCR. This leads to the expression of the CFAR-loss

SCR( PFA , PD )
CFAR loss = [5]
SCR∞ ( PFA , PD )

6.5 Training Set

The data set on which the performance tests and evaluation should be based will be
defined in two independent terms, background clutter distribution and target signal
scenarios. The range and number of clutter distributions is picked in such a way that a
wide range of distributions is represented in the data set and also training and testing are
possible in a reasonable time.

October, 96 33
Experiments

6.5.1 Background Clutter Distribution

The training set will be based on:

• 4 different Weibull distributions (shape parameter a varies from 1.0 to 2.0 in 0.333
steps)

• 1 clutter level (scale parameter b=1)

Each clutter distribution contains 130 data samples and will be created 10 times from
uniformly distributed random variables (see eqn. 2). This leads to 1300 data samples for
each of the four distributions.

6.5.2 Target Signal Scenarios

Background clutter is randomly distributed and independent from targets signals, which
means that one cannot infer from patterns within a clutter on target signals and further
that it does not matter where within the clutter a target appears. It is of more interest
how many targets appear, what distance they have to each other (interfering) and what
amplitude they have. This project is based on single target scenarios and single cell
targets, which means that not more than one target at the time is visible to the detector.

As mentioned before in this section, a good distribution function to approximate target


signals returns is the Rayleigh distribution, a special case of the Weibull distribution that
will be used in this experiments to generate target signal returns. The amplitude of target
signal returns varies (e.g. depending on the surface and distance of a target). Therefore it
is likely that the ANN detector must be trained on different SCR levels to gain high
detection performance, otherwise the detector could adapt to specific amplitudes and not
to signals.

Clutter and signal are added and the SCR can be calculated using eqn. [1] in eqn. [3].

October, 96 34
Experiments

6.6 Training the MLP

Choosing the right training set was a time consuming problem in this project. To achieve
good generalisation capabilities in approaches for pattern recognition, the training set of
the ANN is kept small compared to the whole input domain.

The training set for mixed clutter environments had to be sufficiently large otherwise the
MLP started to memorise the input patterns instead of adapting to the specific
distribution of input patterns.

Memorising input patterns can be avoided by decreasing the number of hidden units.
Smaller number of hidden units forth the net to generalise. On the other hand if the
number of hidden units gets to small the ability of generalisation becomes limited, which
means in the case of mixed clutter environment a decrease in the ability to learn the
adaptation to different clutter distributions.

Initial experiments showed that the larger the training set the better the detector
performance becomes on an arbitrary test set containing patterns from the same
distribution, because in the case of radar signal detection in mixed clutter environments
the input set is of theoretically infinite size.

During the final experiment the best training strategy was to train the MLP first on a
training set with sample from only one or two Weibull distributions with a long tail (i.e. a
small shape value ‘a=1.0’). This initialised the weights with values for the worst case
clutter situation (highest peaks). The second training phase was than based on the weight
set of the first training. For this training a training set with 30.000 patterns from all four
Weibull distributions was chosen.

Training the network on the second training set resulted in higher false alarm rate for
distributions with long tails than for distributions with short tails. Therefore the training
set was changed to be asymmetric. It contained 12000 patterns for a shape parameter
‘a=1.0’, 9000 patterns for ‘a=1.3’, 6000 patterns for ‘a=1.6’ and 3000 patterns for

October, 96 35
Experiments

‘a=2.0’. Training the network on this large training set was very time consuming. The
resulted false alarm rate was not the same for all distributions, it varied over an interval
of 0.03 for ‘a=1.0’ and 0.002 for ‘a=2.0’.

6.7 Approaches

The aim of these experiments is to evaluate the performance of ANN-CFAR detectors in


a mixed clutter environment. Two design factors influence the performance of
conventional CFAR detectors, the threshold estimation algorithm and the reference
window size.

The threshold algorithm estimates from a set of input examples the threshold to separate
target signal returns from background clutter. Most of the threshold algorithms are only
optimal for one type of clutter distribution. The threshold estimation performance
depends strongly on the number of input examples presented to the algorithm. As
mentioned before, this size is limited in radar applications.

In initial experiments all ANN architectures showed problems in learning different clutter
distributions. To be able to reason only about the adaptation to different clutter
distributions, different clutter levels were not considered in this project. However,
training an ANN on different clutter levels appeared not to be a problem, as already
considered by Ramamurti (1993).

It is necessary to distinguish between two main types of ANN-CFAR detectors:

• Trained on target signal detection: ANN detector that is trained to output directly a
detection decision either target detection “1” or non-target detection “0”.

• Trained on threshold estimation: The ANN detector is trained to estimate an optimal


threshold according to the current clutter distribution.

October, 96 36
Experiments

Three ANN approaches were chosen for the experiments. First a multilayer perceptron
(MLP) architecture with a sliding reference window as input, similar to previous ANN
approaches in ANN radar signal detection, Ramamurti (1993) and Amoozegar and
Sundareshan (1994). As second a simple recurrent network (SRN) architecture was
selected with one input unit and a set of context units that should memorise previous
inputs as reference to the actual input. The SRN was trained on both, training on
detection and training on threshold estimation. This was not done for the MLP, but
which would be a good idea to do. The third architecture was a cascaded network (CN)
with the ability to generate different sets of weights according to features of the input
distribution.

This section describes experiments with the three ANN-CFAR detectors and compares
the performance. As a basis for comparison a separate CA-CFAR detector for each
considered background clutter distribution was chosen as a reference. In the next chapter
the performance of ANN-CFAR detectors will be compared with the conventional CA-
CFAR detector.

6.7.1 Multilayer Perceptron Approach

MLPs as mentioned before are able to perform radar signal detection in homogenous
clutter environments. In these experiments, MLPs were trained on target signal detection
in a mixed clutter environment. All experiments were based on the same data set
described earlier in this section. The MLP should adapt to four different Weibull
distributions, therefore a MLP can be trained either directly on all four distributions or
specialised MLPs trained only on one distribution. These specialised networks/units can
be grouped together and through additional units a switching between these units
achieved.

October, 96 37
Experiments

Three different types of MLPs were considered for the experiments:

• MLP with no hidden unit and one output unit, the assumption that already a simple
MLP can be trained on target signal detection should be supported.

• MLP with several specialised hidden units and one output unit, where each hidden
unit is separately trained on one clutter distribution.

• MLP with several hidden units and one output unit directly trained on all four
distributions.

Amoozegar and Sundareshan (1994) considered the robustness of a MLP when the size
of the reference window is small. They observed that the detection performance starts to
degrade very sharply when the reference window size get less than 30 and a CA-CFAR
detector completely loses its detection capability for window size less then 9, where the
MLP detector still delivered appropriate performance.

To avoid a strong dependency on the reference window size, a window size of 31 was
chosen in the experiments, to consider only the performance dependency on different
clutter distributions. All networks were trained with a ‘fast’ backpropagation algorithm
with momentum and adaptive learning rate, Fu (1994).

Experiment No. 1

The first experiment should support the assumption stated in Chapter 5.1.2 that a neural
network with a set of input units and only one output unit can be trained on radar target
signal detection. Therefore four (31/1) architectures were trained each on the training set
of one background clutter distribution. The training turned out to be extremely
dependent on the initial weight set. However, the number of learning epochs achieved in
this experiment is given in Table 2 for an error goal of 0.1:

October, 96 38
Experiments

Weibull shape parameter learning epochs

a = 2.0 2000

a = 1.6666 973

a = 1.3333 9601
+3107

a = 1.0 3107

Table 2
Initial training cycles tended to last longer for Weibull distributions with a longer tail
(smaller shape parameter). A longer tail with respect to the Weibull PDF means that the
values on the right hand side are spread over a larger interval than on then left hand side
of the mean value. This is equal to a higher variance on the tail which complicates the
learning.

The table above also shows this tendency with one exception for a shape parameter a =
1.6666, which might result from random weight initialisation. In the case of a = 1.3333
no initial weight-set, generated by the random initialisation routine, lead to a successful
training. The weights were instead initialised with the weights for a shape parameter a =
1.0 and then retrained for a shape parameter a = 1.3333.

1
initialised with the weight set from the unit, trained on a = 1.0; training was without initialisation not
possible, therefore the whole number of learning epochs was 3107 + 960.

October, 96 39
Experiments

a=2.0 a=1.6666
1 1

0.8 0.8

0.6 0.6
Pd

Pd
0.4 0.4

0.2 0.2

0 0
−20 −10 0 10 20 −40 −20 0 20
SCR [db] SCR [db]
a=1.3333 a=1.0
1 0.8

0.8
0.6
0.6
Pd

Pd

0.4
0.4
0.2
0.2

0 0
−40 −20 0 20 −40 −20 0 20
SCR [db] SCR [db]

Figure 21, ANN performance for different Weibull clutter distributions

The performance of each network was compared for a false alarm rate of 0.002 with
separate CA-CFAR detector each optimal for one distribution. As shown in Figure 21
the ANN (dotted line) has almost the same performance as the CA-CFAR detector (solid
line). This experiments shows clearly that an ANN with only one output unit and a single
layer of weights is able to learn radar signal detection through training.

October, 96 40
Experiments

Experiment No. 2

This experiment considers a modular approach with five hidden units and one output
unit. Four hidden units were specialised, each on one of the four background clutter
distributions. One extra hidden unit supported the output unit with features of the
network input. The aim of this experiment was to investigate if one MLP with specialised
hidden units can gain the same performance in each of the four clutter distributions than
each specialised hidden unit for the specific distribution itself.

The network was trained in two separate steps. It was inspired by cascade correlation
training as described in Fu (1994) p.158. First, four different specialised MLPs with 31
input units and only one output unit were trained each on one background clutter
distribution. Then four of the five hidden units of the final MLP were assigned each, with
the weights of one specialised output unit. These weights were fixed and the whole
network trained on all four distributions, so that the output weights and the weights of
the five’s hidden units could achieve an adaptation to all four distributions. The training
took 962 epochs for a false alarm rate of 0.001and an error goal eg = 0.1.

In Figure 22 the performance of the combined specialised hidden unit approach is


compared with the performance of each specialised unit itself.

October, 96 41
Experiments

a=2.0 a=1.6666
1 1

0.8 0.8

0.6 0.6
Pd

Pd
0.4 0.4

0.2 0.2

0 0
−20 −10 0 10 20 −40 −20 0 20
SCR [db] SCR [db]
a=1.3333 a=1.0
1 0.8

0.8
0.6
0.6
Pd

Pd

0.4
0.4
0.2
0.2

0 0
−40 −20 0 20 −40 −20 0 20
SCR [db] SCR [db]

Figure 22, Performance of MLP with specialised hidden units

Figure 22 shows further that the detection rate starts to decrease again for higher SCR
levels. Unfortunately the false alarm rate varied over a large interval on the test set,
shown in the Table 3. The variation is a criterion for robustness in mixed clutter
environments which means in this case of specialised hidden units, that the detector had a
poor performance.

October, 96 42
Experiments

Weibull shape False Alarm Rate PFA


parameter
specialise single unit
approach approach
(31/5/1) (31/1)
a = 2.0 0.0002 0.0026

a = 1.6666 0.0029 0.0021

a = 1.3333 0.0136 0.0018

a = 1.0 0.0329 0.0020

Table 3
During training the whole network was able to learn the training set, which shows that
the network is able to learn and memorise a specific training set but has not the ability to
generalise from the training set on the test set.

Experiment No. 3

As opposed to the previous experiment, the MLP in this experiment was directly trained
on all four clutter distributions to compare the differences in the performance of
specialised hidden units with non-specialised ones. The network was chosen to be of the
same size (31/5/1) as in the previous experiment. During the training the network turned
out to be much more dependent on the initial weight set than in previous experiments. In
most of the training cases the network remained for more than 300 training epochs on
the same SSE before it started to learn. Training this network was a time consuming task
and only in a few cases successful. A better way to train the network was to start with an
(31/11/1) architecture and prune it down in three steps to (31/5/1), which is described in
the next experiments.

October, 96 43
Experiments

The performance appeared in this experiment better than the performance of the
specialised approach in the last experiment. The MLP performance (dotted line) is
compared in Figure 23 with a CA-CFAR detectors (solid line) that was optimal for the
achieved false alarm rates that are listed in the next experiment.

a=2.0 a=1.6666
1 1

0.8 0.8

0.6 0.6
Pd

Pd
0.4 0.4

0.2 0.2

0 0
−40 −20 0 20 40 −50 0 50
SCR [db] SCR [db]
a=1.3333 a=1.0
1 1

0.8 0.8

0.6 0.6
Pd

Pd

0.4 0.4

0.2 0.2

0 0
−50 0 50 −50 0 50
SCR [db] SCR [db]

Figure 23, Performance of MLP with 5 equally trained Hidden Units

An assumption for the reason of better performance is given in Section 7.

October, 96 44
Experiments

Experiment No. 4

The previous experiment already showed that MLPs are able to perform CFAR detection
in mixed clutter environments. To determine an optimal number of hidden units for the
test set this final MLP experiment was done. An MLP with a (31/11/1) architecture was
directly trained on all four clutter distribution and afterwards pruned. This was done by
cutting away in each step two hidden units that had the highest number of weights close
to zero. Hinton diagrams are a good way to view all weights of one layer. In this
experiment some neurons always contained very small weights and weights that were
zero. This neurons already could be separated from other neurons by considering the
hinton diagram and deleting afterwards the rows with this neurons in the weight matrix.
Initial experiments brought up that further training was most successful if not more than
two neurons were pruned in one step.

Figure 24 compares the performance of the (31/11/1) architecture (dotted line) with a
fixed threshold detector (solid line) optimal for the achieved false alarm rates. It is shown
that the MLP performs with a monotonically increasing detection rate (for a SCR range
of -20dB to 20 dB).

October, 96 45
Experiments

a=2.0 a=1.6666
1 1

0.8 0.8

0.6 0.6
Pd

Pd
0.4 0.4

0.2 0.2

0 0
−20 −10 0 10 20 −40 −20 0 20
SCR [db] SCR [db]
a=1.3333 a=1.0
1 1

0.8 0.8

0.6 0.6
Pd

Pd

0.4 0.4

0.2 0.2

0 0
−40 −20 0 20 −40 −20 0 20
SCR [db] SCR [db]

Figure 24, Performance of a (31/11/1) MLP

To show the variation of the false alarm rate for different distributions, in Table 4 the
range of the false alarm rate for 11, 9, 7, and 5 hidden units are opposed to each other.
The best performance was achieved with a (31/11/1) architecture which had the smallest
variation in the false alarm rate. To be able to compare the intervals of false alarm with
each other, the quotient between upper and lower bound of the false alarm interval had
to be used. The absolute interval range (subtraction of upper and lower bound) can only
be used if the intervals do not overlap and either start or end with the same value.

October, 96 46
Experiments

Weibull shape False Alarm Rate PFA


parameter
with architecture
(31/11/1) (31/9/1) (31/7/1) (31/5/1)
a = 2.0 0.001803 0.001452 0.000150 0.0000834

a = 1.6666 0.007011 0.014522 0.001202 0.0004075

a = 1.3333 0.015223 0.006259 0.003806 0.0008849

a = 1.0 0.028793 0.028242 0.010916 0.0100430

PFA1.0
Interval scale: 15.97 19.4504 72.7733 120.41
PFA2.0

Table 4

6.7.2 Simple Recurrent Network Approach

The ability of simple recurrent networks (SRNs) to memorise previous states should be
compared in this experiments with the sliding window approach from former
experiments. For the experiments an Elman architecture was chosen with one input unit
and one output unit. The fact that recurrent neural networks have good performance in
time series prediction inspired the choice of using training on threshold estimation. The
threshold depends on the statistical distribution of the input set which is not a time series.
Instead it is a series of values, distributed according to a distribution function, where
each value depends on the set of previous values. This leads to the assumption that it is
also possible to predict the threshold according to previous values.

SRN trained on target signal detection

The SRN was trained to detect if the current input value is a target signal. Therefore the
network was trained on all four different clutter distribution from the training set

October, 96 47
Experiments

including different levels of target signals. After training the performance was compared
with previous MLP approaches that were based on a sliding window as input.

SRN trained on threshold prediction

The SRN was trained in the same way as in the previous experiment with the difference
that as target pattern the threshold of an optimal detector for the specific clutter
distribution and a false alarm rate of 10-3 was chosen. Due to prediction of the next
possible threshold value the SRN might achieve a better performance than a CA-CFAR
detector in changing clutter environment, because predicted values might earlier adapt to
changes in the clutter level. The performance was after training compared with previous
approaches.

SRN Performance

For the performance tests the number of weights of the Elman network was chosen in
both approaches to be the same as the number of weights in the previous MLP
experiment. This leads to the following calculation of hidden units

HiddenUnits = ( −1) + MLPWeights + 1 .

Training the network in both approaches was not successful. The training remained after
20.000 learning epochs on a high sum squared error of 19 (target signal detection) and
37 (threshold prediction) which lead to a performance of both detectors far behind the
performance of MLPs and conventional detectors. Therefore a detailed performance
evaluation for this approach is not considered but instead an assumption for the reasons
of failure is considered in Section 7.2.

October, 96 48
Experiments

6.7.3 CN Approach

As we have seen from previous experiments, a relatively small MLP is able to detect
radar return signals in one kind of background clutter. To achieve an increased
performance for different clutter distributions, the MLP approach was extended to a
cascaded network (CN). The context network of the CN offers the MLP a possibility to
generate different sets of weights, where each set of weight could be optimal for one
kind of clutter distribution. As function network an architecture with 31 input units 3
hidden units and 1 output unit was chosen. To enable the context network to carry out
the switching, all 31 input units were also feed into the context network. The output
values of the context network are the weights of the function network, this leads to 100
output units, calculated as follows:

OC = ( IF + bias ) ⋅ HF + ( HF + bias ) ⋅ OF OC = Output Context ; OF = Output Function


with .
OC = (31 + 1) ⋅ 3 + (3 + 1) ⋅ 1 = 100 IF = Input Function; HF = Hidden Function

A hidden layer introduced in the context network has in this experiment two purposes.
One purpose is to reduce the number of weights in the context network. The fact that the
context network has 31 input units and 100 output units would lead to 3200 weights
without any hidden layer, this influences the learning speed enormously. To keep the
number of weights small, 3 context hidden units were chosen in this example. One hidden
unit more would mean 100 more weights. The total number of context weights results in

WC = ( IC + bias ) ⋅ HC + ( HC + bias ) ⋅ OC
with WC = Context Weights; OC = Output Context .
WC = (31 + 1) ⋅ 3 + (3 + 1) ⋅ 100 = 496 IC = Input Context ; HC = Hidden Context

For the whole cascaded network the context weights have to be added to the number of
virtual function weights that are calculated by the context network. This lead to a total
number of 596 weights, which is more than twice as many as a MLP with a (31/8/3)
architecture.

October, 96 49
Experiments

The second purpose of a hidden layer is to give the network one more layer of weights
that enables it to model more complex input-output dependencies.

The training of the CN was done in the same way as the training of the MLP approaches.
After 130 learning cycles the network already achieved a much lower sum squared error
(SSE) than the MLP mentioned above. 1000 learning cycles further ahead the CN got
stuck at an SSE of 19. Various attempts setting the learning rate up and down only
helped to reduce the SSE to a final value of 9, which results in the same bad performance
as the SRN.

October, 96 50
7. Evaluation

7.1 Specialised hidden units compared with direct training

It was shown that a MLP architecture with specialised hidden units achieved less
performance than a MLP architecture of the same size but trained directly on all
distributions. The reason for this behaviour could be found in the ability of the network
to distinguish between different background clutter distributions, which is necessary for
an appropriate threshold calculation. The specialised MLP contained four specialised
hidden units with fixed weights and a fifth additional hidden unit. Final training on all
four input distribution functions adjusted only the weights of the additional hidden unit
and the output unit. The purpose of this training was to introduce an ability to calculate
according to one of the four input distributions an appropriate detection result.

After the final training the performance on the test set resulted in a high false alarm rate
compared to an MLP of the same size but trained directly on all four distributions. The
specialised MLP had only one hidden unit and one output unit to achieve a distinction
between the input distributions, where the MLP of the same size that was trained directly
on all distributions had the possibility to share units for both tasks; the threshold
calculation and the distribution distinction.

October, 96 51
Evaluation

7.2 Limited learning abilities of SRNs

Training a SRN on all four clutter distributions remained after 20.000 epochs on a SSE
of 19 which resulted in a performance far behind other detectors. Bengio et al. (1994)
considered the ability of memorising knowledge in SRN. They have shown that the
memory capabilities of SRNs decrease exponential and only recently passed states
influence current states highly. To do statistical estimations it is necessary to consider a
sufficiently large sample set, otherwise an increased variance in the estimation result will
be gained which lessens the detection performance. The context units of SRNs do not
provide this sufficiently large set and therefore the network has a limited ability to learn
an adaptation to different distribution functions.

7.3 Comparison between MLP and CN architecture

The CN approach has shown that a CN did not achieve a good performance on the
mixed clutter environment test set. The training was not successful and remained on a
SSE of 9. On the other hand, previous experiments have already shown that simple MLP
architectures − less than the size of the function network − perform well on single
homogenous clutter distributions and therefore the ability of the context network to
switch between different weight sets must be the reason for failing.

The context network consisted of 31 input units, 3 hidden units and 100 output units.
One assumption is that the reduced context hidden information about the input clutter
distribution is not sufficiently large enough to generate for each of the 100 output units
the appropriate weight. An increased hidden information can be achieved by adding more
hidden units. Unfortunately, as mentioned before, this leads to a drastic increase of
context weights and therefore slower learning. More weights can also mean that training
gets more complex which tends to paralyse the learning.

October, 96 52
Evaluation

Reducing the function net to a very small net offers no architectural advantages against
an CA-CFAR detector. The context network in this case would do nothing else than to
predict the optimal weights according to the input distribution. This is similar to an
approach that estimates the parameters of the input distribution or estimates the fixed
threshold parameter (alpha), as pointed out in future work. These networks would
probably have less weights than the context network, because they do not have as many
outputs.

The CN approach compared with the MLP approach shows that MLP architectures with
less than half of the weights are able to learn the whole task and adapt to all four
distributions.

7.4 Comparison between ANN-CFAR and CA-CFAR

This project investigates the robustness of ANNs in mixed clutter environment. Most
conventional detectors as mentioned before are only optimal for one clutter distribution.
Changes in the clutter environment will lead to changes in the false alarm rate and
detection performance. Robustness in this sense means that the false alarm rate should
remain constant if the environment change (CFAR). This is in practice very hard to
achieve. As a measurement for robustness the interval of the false alarm rate for different
distributions can be used. Table 5 shows that the scale of the interval is in the case of a
CA-CFAR two times larger than the scale of the MLP interval.

October, 96 53
Evaluation

Weibull shape False Alarm Rate PFA


parameter
with architecture
MLP (31/11/1) CA-CFAR
a = 2.0 0.001803 0.002003

a = 1.6666 0.007011 0.011818

a = 1.3333 0.015223 0.030095

a = 1.0 0.028793 0.057186

Interval scale: 15.97 28.55


PFA1.0
PFA2.0

Table 5
Experiment No. 4 shows further that also the detection performance of the MLP-CFAR
detector is very good in mixed clutter environments, which leads to a better overall
performance of the MLP-CFAR than of the CA-CFAR detector

October, 96 54
8. Conclusion
The aim of this MSc project was to investigate if artificial neural networks (ANNs) can
be trained to perform constant false alarm rate (CFAR) detection of radar signals in
mixed clutter environments.

Conventional radar signal detectors are commonly designed for homogenous clutter
environments. In practice, real life clutter is difficult to model accurately, because it is
non-homogenous, non-stationary and distribution functions can at best be a good
approximation.

ANNs learn from sample data. If they are able to learn CFAR detection in previously
known mixed clutter environments as considered in this thesis then they should also be
able to learn CFAR detection from real data measured by a radar in different clutter
environments. The distributions of this clutter need not to be known, because the ANN
can build its own models from the measured data and be near-optimal for this
environment.

8.1 Approaches

ANNs in their simplest form with only one input layer and one single output unit are
applicable to CFAR detection in homogenous clutter environment. On the other hand
these ANNs gain no performance advantage against other conventional detectors. This
project considered more complex ANN architectures to achieve performance advantages
with ANNs in mixed clutter environments.

October, 96 55
Conclusion

• An MLP architecture with a sliding reference window as input, similar to previous


ANN approaches in radar signal detection.

• An SRN architecture with one input unit and a set of context units that should
memorise previous inputs as reference to the actual input.

• An CN architecture with the ability to switch according to input distribution features


between different weight sets.

8.2 Achievements

The conventional CA-CFAR detector, only optimal for one type of clutter distributions,
produce a high variation in the false alarm rate if the detector is used for other clutter
distributions than it is designed for. The variation of the CA-CFAR detector was almost
double as big as the variation of the MLP-CFAR detector used in this project.

This pointed out that MLPs achieve better performance characteristics in mixed non-
Gaussian clutter environment than conventional detectors. A combination of this project
result with the research of Ramamurti (1993) and Amoozegar and Sundareshan (1994)
shows that MLPs trained on radar signal detection achieve a better performance
compared to conventional detectors for small reference window sizes (Amoozegar and
Sundareshan), perform better than conventional detectors in non-Gaussian noise
(Ramamurti) and have the ability to perform detection as an universal detector in mixed
non-Gaussian clutter environment.

8.3 Future work

Different ways are possible for future work. Up to now it has not been shown that all
features achieved in independent research activities can be combined in one detector
either with a single ANN or a hybrid of different approach.

October, 96 56
Conclusion

During the experiments the main problem was how to train ANNs effectively on different
distribution functions. Training a network not only on different clutter distributions, but
also on different clutter levels can turn into a time consuming problem. The ANNs
problems was to learn the distinction of different distributions. Therefore an investigation
on parameter estimation of distribution functions or parameter estimation of the fixed
threshold parameter α could be worth doing. The estimation probabilities of ANNs
should be compared with statistical estimation results, in example with results from the
maximum-likelihood (ML) algorithm, Ravid (1992).

Estimated parameters obtained from statistical algorithms have a certain estimation


variance dependent on the number of sample values. It is likely that ANNs deliver also
for this task better results with a lower estimation variance, because Amoozegar and
Sundareshan (1994) already pointed out that MLPs perform better for small reference
windows than conventional detectors. This belongs also to the category of parameter
estimation problems and therefore supports the assumption that ANNs deliver also better
performance in parameter estimation than statistical algorithms.

Estimated parameters as output results from ANNs can be used for radar signal detection
as described in Ravid and Levanon (1992) who implemented an adaptive threshold
detector for estimated parameters in Weibull clutter.

Two types of problems that occur also in the process of radar signal detection are the
influence of interfering targets on the detection performance and the increase of the false
alarm rate due to clutter edges as discussed in Section 2. These problems were not
considered in this project or any other of the previous work. Suggestions for approaches
can be found in the appendix.

October, 96 57
9. References
Amoozegar F., Sundareshan M.K. (1994), A Robust Neural Network Scheme for
Constant False Alarm Rate Processing for Target Detection in Clutter
Environment, American Control Conference, Baltimore, Maryland

Barton D.K. (1988), Modern Radar System Analysis, Artech House

Bengio Y., Simard P., Frasconi P. (1994), Learning Long-Term Dependencies with
Gradient Descent is Difficult , IEEE Trans. on Neural Networks, vol. 5, no. 2,
p. 157-166

Bogler P.L. (1990), Radar Principles With Applications To Tracking Systems, Wiley &
Sons, Inc.

Bucciarelli T., Fedele G., Parisi R. (1993), Neural Networks Based Signal Detection,
NAECON 93, Daytona

Elman J.L. (1990), Finding Structure in Time, Cognitive Science 14, pp 179-211, Ablex
Publishing Corporation

Farina A., Studer F. A. (1986), A Review of CFAR Detection Techniques in Radar


Systems, Microwave Journal

Farina A. (1987), Optimised Radar Processors, Peter Peregrinus Ltd. on behalf of the
Institution of Electrical Engineers

Fu L., Neural Networks in Computer Intelligence, McGraw-Hill, 1994

October, 96 58
References

Grossberg S. (1995), Introduction: 1995 Special Issue Automatic Target Recognition,


Neural Network Journal (8/7-8), Pergamon

Jacobs R.A., Jordan M.I., Nowlan S.J., Hinton G.E. (1991), Adaptive Mixture of local
experts, Neural Computation, pp. 79-87

Jordan M. (1996), Neural Networks, CRC Handbook of Computer Science, CRC Press,
Florida

Pollack J.B. (1987), Cascaded Back-Propagation on Dynamic Connectionist Networks,


Proc. Ninth Conf. of the Cognitive Science Soc., Seattle

Ramamurti V., Rao S.S., Gandhi P.P. (1993), Neural Networks for Signal Detection in
Non-Gaussian Noise, IEEE 1993 International Conference on Acoustic,
Speech and Signal Processing(ICASSP ‘93), Minneapolis, MN

Ravid R., Levanon N. (1992), Maximum-likelihood CFAR for Weibull background, IEE
Proceedings-F, Vol. 139, No.3

Rumelhart D.E. (1986), McClelland J.L. and the PDP Research Group, Parallel
Distributed Processing, Volume 1, Cambridge, MIT Press

Stimson G. W. (1983), Introduction to Airborne Radar, Hughes Aircraft Company, El


Segundo, California

October, 96 59
Appendix
Definitions

http://www.sti.nasa.gov/nasa-thesaurus.html An electronic version of the NASA


Thesaurus, August 1996.

THERMAL NOISE

The noise at radiofrequency caused by thermal agitation in a dissipative body. Also called
Johnson noise.

CLUTTER

Atmospheric noise, extraneous signals, etc. which tend to obscure the reception of a
desired signal in a radio receiver, radarscope, etc.

FALSE ALARMS

In general, the unwanted detection of input noise. In radar, an indication of a detected


target even though one does not exist, due to noise or interference levels exceeding the
set threshold of detection.

October, 96 60
Appendix

SIGNAL TO NOISE RATIOS

Ratios which measure the comprehensibility of a data source or transmission link, usually
expressed as the root mean square signal amplitude divided by the root mean square
noise amplitude.

http://www.cas.lancs.ac.uk/glossary_v1.1/Alphabet.html, August 1996

Cumulative Distribution Function

All random variables (discrete and continuous) have a cumulative distribution function. It
is a function giving the probability that the random variable X is less than or equal to x,
for every value x.

Estimate

An estimate is an indication of the value of an unknown quantity based on observed data.

PROBABILITY DISTRIBUTION

The probability distribution of a discrete random variable is a list of probabilities


associated with each of its possible values. It is also sometimes called the probability
function or the probability mass function.

October, 96 61
Appendix

Approaches for Interfering Targets and Clutter Edges

Two different types of problems will be addressed:

• Detection of small signals in the neighbourhood of interfering targets.

• Avoidance of false alarms at the border of clutters due to clutter edges.


25
20

18
signal signal
threshold threshold
20
16 interfering
target
14

15
12
amplitude
amplitude
10

target in cell
10
8 under test

5
4

0
0 200 210 220 230 240 250 260 270 280 290 300
0 20 40 60 80 100 120 140 160 180 200
range range

Figure 25, Interfering Target, Clutter Edge

Interfering Targets
The problem of interfering targets can in the case of a MLP be addressed, by doing a
pre-processing that filters out all high target signals which, due to the high amplitude, are
easy to separate from the background clutter. In a following step, a more accurate
processing follows which does an exact separation.

Modular MLP trained to avoid false alarms at clutter edges


Clutter edges are a problem for detectors based on averaging. Also a MLP for CFAR
detection is based on averaging and will lead to false detections at clutter edges. A
modular approach could solve this problem. One set of modules is responsible for target
signal detection. An additional module is further trained on clutter edge detection and
gives information if a clutter edge is currently present or not. A combination of the
modules could combine both results and avoid false detections at clutter edges.

October, 96 62
Appendix

In experiments a MLP could be trained on the detection of clutter edges. For this the
MLP should give a clutter edge detection signal every time a clutter edges is passing the
cell under test and three of its closest neighbours on each side. For this, two experiments
should investigate clutter edge detection with MLPs:

• MLP with several hidden units and one output unit, where each unit should have a
sigmoid activation function.

• MLP with several hidden units and one output unit. All hidden units should have a
radial basis activation function only the output unit is a sigmoid activation function.
The reason why to use a radial basis activation function is, that the network could
implement a subtraction of the left halve of reference cells from the right halfe of
reference cells. If the result is large positive or negative, a clutter edge is present.
Small result means that the clutter level is nearly similar in the whole reference
window. This fits good to an exponential radial basis function, where the function
result is equal for large positive and large negative values.

October, 96 63

You might also like