Kernel Orthonormalization in Radial Basis Function Neuralnetwork

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO.
5, SEPTEMBER 1997
1177
Kernel Orthonormalization in Radial

Basis Function Neural Networks
Wladyslaw Kaminski and Pawel Strumillo
Abstract This paper deals with optimization of the computations involved in training radial basis function (RBF) neural
networks. The main contribution of the reported work is the
method for network weights calculation, in which the key idea is
to transform the RBF kernels into an orthonormal set of functions
(by using the standard GramSchmidt orthogonalization). This
significantly reduces computing time (in comparison to other
methods for weights calculations, e.g., the direct matrix inversion
method) if the RBF training scheme, which relies on adding one
kernel hidden node at a time to improve network performance,
is adopted. Another property of the method is that, after the
RBF network weights are computed (via the orthonormalization
procedure), the original network structure can be restored back
(i.e., to the one containing RBFs in the hidden layer). An
additional strength of the method is the possibility to decompose
the proposed computing task into a number of parallel subtasks
so gaining further savings on computing time. Also, the proposed weight calculation technique has low storage requirements.
These features make the method very attractive for hardware
implementation. The paper presents a detailed derivation of the
proposed network weights calculation procedure and demonstrates its validity for RBF network training on a number of
data classification and function approximation problems.
Index Terms Approximation, function orthonormalization,
RBF neural networks.
I. INTRODUCTION
ADIAL basis function (RBF) networks, which have recently attracted an extensive research interest [1][5], can
be regarded as an amalgamation of a data modeling technique
in a high-dimensional space called RBF [6] and an universal
approximation scheme popularly known as artificial neural
networks (ANNs) [1]. Effectively, a novel and powerful data
processing tool has been proposed [7] in which the exact
interpolation regime required in the original RBF method has
been relaxed and put into a layered network connectionist
structure.
An RBF network approximates an unknown mapping function
, by using a two-layer data processing
structure. First, the input data undergoes a nonlinear transformation via the basis functions in the network hidden layer,
then basis functions responses are linearly combined to give
the network output. Hence, the overall inputoutput transfer
Manuscript received March 19, 1996; revised October 10, 1996 and
April 2, 1997. This work has been carried out as a part of the Research
Project 0399/P4/94/07 sponsored by the Polish State Committee for Scientific
Research.
W. Kaminski is with the Faculty of Process and Environmental Engineering,
Technical University of Lodz, 90-924 Lodz, Poland.
P. Strumillo is with the Electrical and Electronics Engineering Faculty,
Technical University of Lodz, 90-924 Lodz, Poland.
Publisher Item Identifier S 1045-9227(97)05246-6.
function of the RBF network can be defined as the following

mapping:
(1)
where
are the network adjustable weights connecting
network hidden nodes with network output,
,
, are radially symmetric transfer functions with
centers
, and
denotes the Euclidean distance.
Some of the typical functions used for construction of hidden
kernel nodes in an RBF network are [6][10]
(2)
(3)
(4)
where (2) is the Gaussian function and (3) and (4) are
multiquadratic functions.
One of the key problems when designing an RBF network
is proper choice of the number of basis functions which act
as the receptive field for the network. Too small a number of
these functions may result in poor approximation accuracies
(sampling of the input data too sparse), whereas too large a
number of these functions can cause overfitting of the data and
poor generalization performance on the validation set. Use of
regularization theory for construction of RBF networks has
been proposed to deal with these conflicting requirements [8].
This theory imposes additional smoothness constraints on the
approximation criteria in order to improve generalization on
new data.
For a chosen number and type of network basis functions
two sets of parameters need to be determined (trained) in
order to make the network correctly perform the desired data
processing task. First, there are center
positioning the
basis functions within the space spanned by the network input
data , and second, there is a set of output weights
which
propagate basis function responses to be linearly combined at
the network output layer [see (1)]. Accordingly, the training
task of an RBF network can be broken into two phases:
selection of basis functions centers, followed by determination
(training) of the output layer weights.
This paper focuses on the second phase of RBF network
training, i.e., the task of searching for the best weights
that would maximize network performance for a given
set of training data and chosen set of basis functions. This
10459227/97$10.00 1997 IEEE
1178
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997
problem becomes particularly acute if very complex mapping

functions have to be designed for a massive number of input
data in a high dimensional space. For such cases, the number of
network hidden nodes may turn out to count tens or hundreds.
Correspondingly, the number of network output weights to
be determined grows. Solution to such a large optimization
problem (although linear) involves high computation costs
and puts a high demand on the computer storage space. In
this paper the following two-stage procedure for calculation
of an RBF network output weights is proposed to address
this problem. First, RBFs are transformed into the set of
orthonormal functions for which the optimum network output
layer weights are computed. This allows elimination of the
requirement to compute the off-diagonal terms in the linear set
of equations for computations of the optimum network weight
set. Second, these weights are recomputed in such a way that
their values can be fitted back into the original RBF network
structure (i.e., with kernel functions unchanged). It will be
shown that such a procedure significantly reduces the weights
computing time, if the RBF training scheme for improving the
network approximation accuracy and relying on adding one
kernel node at a time is adopted.
The GramSchmidt algorithm employed for orthogonalization of basis functions is a well-known standard numerical
method [11], and has been used in a variety of applications, e.g., for calculations of the Volterra series weights
in nonlinear system identification [12] and optimum choice
of RBF centers [13]. Also, more recently, other orthogonal
transformations [like singular value decomposition (SVD) and
principal component analysis (PCA)] have been used for
optimization on the size and structure of feedforward networks
[14], [15]. However, to the authors best knowledge, the
application of the orthogonalization algorithm proposed in the
context of the transformation of an RBF network structure
and the advantages it brings to network training, has not been
discussed earlier in the literature.
The rest of this paper is organized as follows. Section II
mentions various approaches to determination of RBF network
centers and highlights computations involved in calculations
of network output layer weights. Section III provides detailed description of the proposed kernel orthonormalization
procedure. In the next section, the validity of the method
introduced for training an RBF network is demonstrated using
a number of data classification and function approximation
benchmarks. Section IV concludes with a summary of the
presented study. Finally, the Appendix explains a recursive
procedure introduced to the standard GramSchmidt orthogonalization algorithm allowing for more efficient calculations
of network output weights.
II. RBF NETWORK DESIGN
AND
TRAINING
There are different data processing tasks allocated for each

of the layers of an RBF network. The hidden layer, composed
of radially symmetric basis functions, plays the role of a
flexible receptive field operating on the input data space. On
the other hand, the network output layer adjusts its weights as
it searches for the optimum contribution of each of the basis
functions to the network output so that the best match to the

pattern of desired responses is achieved.
Various strategies have been proposed for choosing the
number of basis functions and locations of their centers in an
RBF network. A frequently sufficient approach is to choose
the centers randomly from the training data set so that the
distribution of centers corresponds to the distribution of the
input data [7]. Another possibility is to realize a self-organized
choice of centers based, e.g., on the K-means unsupervised
learning algorithm [16]. This allows basis functions centers
to be grouped in statistically important input data regions.
Yet another approach, called orthogonal least squares (OLS)
learning, which is used for optimum choice of the number and
locations of basis function centers, was proposed [13]. Also,
very recently, high performances for small size RBF networks
have been achieved if the centers are optimized by means of
genetic algorithms [17].
It must be stressed however, that design decisions about
the number of basis functions and locations of their centers
in an RBF network are performed without any knowledge
about the data processing task, i.e., the relations between the
network input and desired responses at network output. A
sole information about input data distribution is a necessary
but frequently not a sufficient condition to guarantee high
network performance. This is because, irrespective of the
method used for basis function selection, their quality of input
data perception can only be evaluated after output weights are
determined through some optimization procedure.
Since an RBF network features a linear output, the problem
of computing network output weights can be defined as a
problem of linear optimization. This problem can be solved
either by direct numerical least-squares methods, like the
MoorePenrose pseudoinverse method [18], [19] or by an iterative error-correction learning rule such as the LMS algorithm
[20]. These computations (irrespective of the method used)
may have to be performed many times, e.g., for an increasing
number of network kernel nodes, until the data processing goal
is solved within an acceptable error margin. Consequently, this
training stage of the RBF network may become the main factor
in deciding about the overall demand on computing time and
computer storage requirements.
III. ORTHONORMALIZATION OF KERNEL FUNCTIONS
In order to keep further derivations compact, a single
variable output network will be considered and an extension to
higher dimensions will be explained in the next section after
key points of the method are presented. Hence, RBF network
structure, with a scalar output
to be considered is
the one shown in Fig. 1.
The training process of an RBF network output layer relies
on adjusting the weights values
of the network output linear
combiner. The objective is to minimize the error, between the
network output
and its desired value
for the available
training inputoutput pairs
,
. The sum
squared error is chosen as the network performance criterion
(5)
KAMINSKI et al.: KERNEL ORTHONORMALIZATION
1179
The norm in the functional space

is defined on the
basis of the inner product definition (7) in the following way:
(8)
A set of functions is said to be orthogonal if
for
for
(9)
Additionally, the set of orthogonal functions is said to be

orthonormal if
(10)
Fig. 1. The RBF network.
A minimum of this cost function can be found by calculating

partial derivatives of (5), respectively, for each of the weights
and solving the resulting system of
linear
. This system of equations,
equations for
which is the so-called normal equation system [18], has the
form
In general, however, the set of functions

resulting from the choice of basis functions and
their centers is not orthogonal. The underpinning idea behind
the proposed method for calculating RBF network output
weights is based upon transforming the network radial functions
into a set of orthonormal basis functions. Namely,
the standard GramSchmidt orthonormalization algorithm is
applied to radial functions
in order to obtain an orthonormal set of basis functions
(11)
and it is assumed that the set of functions

in (6) is linearly independent, i.e.,
locations of their centers are distinct.
is orthonormal accordThe resulting set of functions
ing to definitions given in (9) and (10). Note, that functions
can be expressed as linear combinations of the set of
radial functions. Hence
where
(6)
(12)
in which the substitution
is introduced
to simplify the notation.
If the locations of the basis function centers, i.e., vectors
of coordinates
are distinct then the set of equations
(6) is linearly independent and has a unique solution [2].
Additionally, this is a single globally optimum solution to (5).
For the sake of clearer explanation of the key points of the
proposed method for solution of (6), the following notations
are introduced for the space of multivariable functions
.
The inner product of two functions is defined as
(7)
are calculated from equation set (11).

where coefficients
Now, if the set of orthonormal functions
is used as the
new set of kernel functions in the RBF network hidden layer
[i.e., functions
are substituted by proper expressions for
], the solution of (6) assumes the following
functions
simple formula:
(13)
is a set of new weights which
where
linearly combine signals generated at the new hidden layer
composed of orthonormal functions
. Thus, an overall
1180
TABLE I
IRIS DATA CLASSIFICATION RESULTS OBTAINED FOR 15 RBF KERNEL NODES
WITH THE USE OF THE PROPOSED KERNEL ORTHOGONALIZATION SCHEME
Fig. 2. Kernel orthonormalization procedure of the RBF network.
transfer function of the RBF network with orthonormalized

hidden kernel nodes can be expressed by
but with the connection weights computed from (18); see also
Fig. 1.
The presented orthonormalization method has a simple
extension to a multivariable network output. In such a case,
calculations of the weights
need to be repeated in an
identical manner as shown in the sequence of (6)(18) for
each element of the output vector
. Obviously, in general,
the computed weight values
for each of the network output
linear combiner node will be different. Also, see the Appendix
where the recursive calculation procedure for
, which is
suggested, eliminates the need for repetitive computation of
functions inner products in the set of equations (11).
IV. RESULTS
(14)
Graphical interpretation of the described RBF kernel orthonormalization procedure is illustrated in Fig. 2. Now, after the
expression for
given in (12) is substituted into (14) one
gets
(15)
under the second summation sign the
and by moving
network transfer function may be expressed as
(16)
Then, by grouping appropriate terms in the last expression
one gets
(17)
and by comparing (1) and (17) the following formula is
obtained:
(18)
which simply implies that the overall transfer function of
the RBF network after the orthonormalization procedure is
performed, i.e., (17), can be expressed in an identical form as
given in (1). Accordingly, the resultant network structure will
be identical to the initial structure of the used RBF network
AND
SUMMARY
The presented RBF network orthonormalization procedure

has been tested on two data classification problems and one
dealing with chaotic time series prediction.
A. Classification of Irises
This popular benchmark relies on classification of three
classes of iris flowers (called iris setosa, iris versicolor, and iris
virginica) on the basis of four features measured for each of
the classes: sepal length and width and petal length and width.
There are 50 feature vectors with four elements each for each
class in the data set. Forty feature vectors chosen randomly
from each class were used for RBF network training and the
remaining data served for network testing (i.e., ten vectors
from each class). The network had four inputs, Gaussian
kernels in the hidden layer and three network outputs used
to account for three different flower classes. An iris was
considered to belong to a certain class, if its corresponding
network output was ON
and the two other outputs
remained OFF
. Thus, the three target
vectors were: (0, 0, 1), (0, 1, 0), and (1, 0, 0).
Network centers were chosen according to a uniform probability distribution spanning the range of flower feature values.
Calculations of network output layer weights started from
five hidden kernels and then one hidden node was added at
a time to maximize the fit to the training data. It has been
observed that for the testing set, network performance ceased
to improve for 15 kernel nodes. The results obtained are
gathered in Table I. Network response was considered correct
if the networks output node associated with the given iris
class exceeded the 0.5 threshold and the remaining two output
nodes remained below this threshold.
1181
TABLE II
COMPARISON OF COMPUTATIONAL COSTS FOR CALCULATING OUTPUT
NETWORK WEIGHTS FOR THE PROPOSED ORTHONORMALIZATION
SCHEME AND THE GAUSSSEIDEL ELIMINATION METHOD
(a)
Additionally, for the tested benchmark, a computational cost

involved in the calculation of network output weights with the
use of the proposed orthonormalization scheme was compared
to the matrix inversion method based on the GaussSeidel
elimination method. Table II summarizes results of this comparison. The number of RBF kernel nodes was increased
successively by one node at a time from five up to 30. For 15
kernel nodes, for which the network performance has reached
its plateau (see the row of digits written in bold), the computing
cost of the GaussSeidel method was more than threefold
for the number of multiplications and almost twofold for
additions when compared to the orthonormalization method
[calculations of both
and
were considered, see (13)
and (18)]. Note that, this difference becomes larger with an
increasing number of basis functions.
B. The Two-Spiral Benchmark
In the two spiral benchmark the task of the tested data
classifier is to separate a rectangular fragment of a two dimensional space into disjoint areas, each containing a different
spiral curve as shown by the training data points in Fig 3(a).
This is an extremely hard task to solve by an MLP network
trained with the backpropagation algorithm due to complex
shapes formed by the data points. There were 200 data points
given by
coordinates representing each of the spirals.
Accordingly, the target set consisted of 200 desired values
for one spiral [open circles in Fig. 3(a)]
coded HIGH
and 200 values coded LOW
for the second [stars
in Fig. 3(a)]. A two input (one input for a coordinate) and
one output RBF network with Gaussian kernels was used
for spiral separation. Centers of the basis functions were
drawn randomly from a two-dimensional Gaussian distribution
centered at the spirals foci with variance equal to 0.25. By
incrementing the number of RBF network weights it was found
that for the tested kernel orthonormalization procedure 64 basis
functions were sufficient to produce very confident classification results as illustrated in Fig. 3(b). Network output for each
(b)
Fig. 3. (a) Two spiral shaped classes of points used for learning an RBF
network. (b) Separation of the spirals obtained for 64 RBF kernel functions
trained with the orthonormalization procedure proposed in the paper; there
are 41
41 testing points of the classified area.
of the tested points is gray level coded (BLACK LOW and

WHITE HIGH).
C. Time Series Prediction
Time series prediction of the quadratic map
, originally proposed for biological systems
modeling, is known to generate chaotic dynamics on the interval
[21]. Prediction of this time series is recognized
as a demanding benchmark for testing various neural-network
architectures [7], [22]. Because this is a first-order process,
an RBF network used for prediction of this series has one
input, connected to a sample at discrete time instance
and is asked to predict the next point in the series at time
. Centers of network basis functions were drawn from a
uniform distribution in the range
. Data generated from
500 iterations of the map was used for training. The true
network ability to discover an underlying determinism in this
chaotic time series was tested on a different set of 500 points.
Network predicting performance for the test set is shown in
1182
TABLE III
ONE-STEP AHEAD PREDICTION OF THE CHAOTIC TIME SERIESQUANTITATIVE
COMPARISON FOR DIFFERENT NUMBER OF NETWORK HIDDEN NODES
Fig. 4. The quadratic map (solid line) and its approximation by an RBF
network with five Gaussian kernels (centers of the open circles).
the form of a phase plot in Fig. 4. The RBF network one-step

ahead predictor achieved a very close approximation of the
quadratic function (indicated by open circles). These results
are produced for five hidden Gaussian kernels in the network
hidden layer. Quantitative evaluation of this performance is
gathered in Table III. This table compares network prediction
quality for a different number of hidden nodes on the basis
of two error measures for both the testing and the training
set. The error measures are: the mean square error (MSE)
and cross-correlation coefficient calculated according to the
formula
recursively (see the Appendix). Furthermore, the proposed

computation procedure can be organized in a parallel manner
which is a very favorable feature if an on-line applications or
hardware implementations of the method are envisioned.
A novel element of the presented method is the combination of the kernel orthonormalization algorithm for weights
calculation with the procedure allowing restoration of the original RBF network structure (with not necessarily orthogonal
kernels) once weights computation has been completed [see
(18)]. Hence, a useful property of an RBF network allowing
the evaluation of the contribution of each single basis function
to the overall network output has been retained. Excellent
performance of the proposed RBF network training method
has been confirmed on a number of benchmarks.
APPENDIX
OPTIMIZATION OF THE ORTHONORMALIZATION PROCEDURE
(19)
is the number of test points,

are values
where
generated by the quadratic map for the testing set,
are
corresponding network predictions and is the mean value of
the test set. Observe, that even for three hidden nodes, an RBF
network, trained according to the proposed orthonormalization
procedure, yields very accurate prediction results for the
testing set
MSE
.
Further savings on the computational load of the proposed

method can be gained by eliminating the need for repetitive
computation of inner products in (11). This is achieved by
substituting nominators in (11) by proper expressions for
orthonormal functions
given in (12)
(20)
D. Summary
An efficient noniterative technique for computations of the
RBF network weights has been proposed in which RBF kernels
are transformed into an orthonormal set of functions. This
has allowed the requirement, to compute the off-diagonal
terms in the linear set of equations for computations of the
optimum network weight set to be eliminated. Consequently,
incorporation of new network hidden nodes aimed at improving network performance does not require recomputation
of the network weights already calculated. This property of
the proposed method allows for a very efficient network
training procedure where network hidden nodes are added one
at a time until an adequate error goal is reached (see the
Iris classification benchmark). Additionally, the method has
low storage requirements since the weights can be computed
By substituting (20) into (11) one gets
(21)
where
(22)
1183
and from (20)
and
(23)
in which by using (12) the inner product in the right-hand side

of (23) is calculated in the following way:
(24)
Equations (22) and (23) now allow for the following recursive calculation of coefficients
. First, it is noted that
is
equal to the inverse value of the norm defined in (8). Then, the
coefficient
is calculated from (22) and next the coefficient
is obtained from (23). Further calculations proceed in a
similar manner until the values of coefficients
and
are finally calculated. Once all values for
the coefficients
are known, the weight values
can be
calculated by substituting into (13) the expressions for
given in (12). This finally yields the following:
(25)
Recall that the primed weights are different from those used
in the original expression for RBF network transfer function
(1); compare also Figs. 1 and 2.
REFERENCES
[1] R. Hush and B. G. Horne, Progress in supervised neural networks:
Whats new since Lippmann? IEEE Signal Processing Mag., vol. 10,
pp. 839, Jan. 1993.
[2] M. Bianchini, P. Frasconi, and M. Gori, Learning without local minima
in radial basis function networks, IEEE Trans. Neural Networks, vol.
6, pp. 749755, May 1995.
[3] T. Chen and H. Chen, Approximation capability to functions of several
variables, nonlinear functionals, and operators by radial basis function
neural networks, IEEE Trans. Neural Networks, vol. 6, pp. 904910,
July 1995.
[4] D. Gorinievsky, On the persistency of excitation in radial basis function network identification of nonlinear systems, IEEE Trans. Neural
Networks, vol. 6, pp. 12371244, Sept. 1995.
[5] E. S. Chng, S. Chen, and M. Mulgrew, Gradient radial basis function
networks for nonstationary nonlinear time series prediction, IEEE
Trans. Neural Networks, vol. 7, pp. 190194, Jan. 1996.
[6] M. J. D. Powell, Radial basis functions for multivariable interpolation:
A review, in Algorithms for Approximation of Functions and Data, J.
C. Mason, M. G. Cox, Eds. Oxford, U.K.: Oxford Univ. Press, 1987,
pp. 143167.
[7] D. S. Broomhead and D. Lowe, Multivariable functional interpolation
and adaptive networks, Complex Syst., vol. 2, pp. 321355, 1988.
[8] T. Poggio and F. Girosi, Networks for approximation and learning,
Proc. IEEE, vol. 78, no. 9, pp. 14811497, Sept. 1990.
[9] H. Demuth and M. Beale, Neural Network Toolbox Users Guide. The
Mathworks Inc., 1994.
[10] J. E. Moody and C. J. Darken, Fast learning in networks of locallytuned processing units, Neural Computa., vol. 1, pp. 281294, 1989.
[11] A. Ralston, Mathematical Methods for Digital Computers. New York:
Wiley, 1960.
[12] M. J. Korenberg, Orthogonal approaches to time-series analysis and

system identification, IEEE Signal Processing Mag., pp. 2943, July
1991.
[13] S. Chen, C. F. N. Cowan, and P. M. Grant, Orthogonal least squares
learning algorithm for radial basis function networks, IEEE Trans.
Neural Networks, vol. 2, pp. 302309, Mar. 1991.
[14] P. P. Kanjilal and D. N. Banerjee, On the application of orthogonal
transformation for the design and analysis of feedforward networks,
IEEE Trans. Neural Networks, vol. 6, pp. 10611070, Sept. 1995.
[15] A. Levin, T. K. Leen, and J. E. Moody, Fast pruning using principal
components, in Advances in Neural Inform. Processing Syst. 6, J. D.
Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA: Morgan
Kaufmann, 1994.
[16] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.
New York: Wiley, 1973.
[17] B. A. Whitehead and T. D. Choate, Cooperative-competitive genetic
evolution of radial basis function centers and widths for time series
prediction, IEEE Trans. Neural Networks, vol. 7, pp. 869880, July
1996.
[18] C. R. Rao and S. K. Mitra, Generalized Inverse of Matrices and its
Applications. New York: Wiley, 1971.
[19] G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd ed.
Baltimore, MD: John Hopkins Univ. Press, 1989.
[20] B. Widrow and M. A. Lehr, 30 years of adaptive neural networks:
Perceptron, madaline, and backpropagation, Proc. IEEE, vol. 78, pp.
14151442, 1990.
[21] R. M. May, Simple mathematical models with very complicated
dynamics, Nature, vol. 261, pp. 459467, 1976.
[22] A. Lapedes and R. Faber, Nonlinear signal processing using neural
networks: Prediction and system modeling, Los Alamos National
Laboratory, Los Alamos, NM, Res. Rep. LA-VR-87-2662, 1988.
Wladyslaw Kaminski was born in Lodz, Poland,

in 1949. He received the Ph.D. degree from the
Technical University of Lodz and the D.Sc. degree
from the Technical University of Silesia, Poland,
both in chemical engineering, in 1977 and 1989,
respectively.
He was a Visiting Professor at a number of
universities world wide (e.g., RWTH Institute of
Chemical Engineering, Aachen, Germany, in 1992;
Higher National School of Food and Agriculture
Industry, Massy, France, in 1994; University of
Canterbury, New Zealand, in 1996). He is currently a Professor in the
Faculty of Process and Environmental Engineering, Technical University
of Lodz, Poland. He is an author or coauthor of six patents, 115 papers,
three student handbooks, five chapters in professional handbooks, and 24
research projects for industry. His main research fields are optimization and
mathematical modeling of heat and mass transfer processes, drying, spouted,
and fluidized bed processes, and membrane processes. His recent interests
include artificial neural networks theory and their applications to engineering
processes modeling.
Dr. Kaminski holds membership in the Polish Association of Chemical
Engineers (PTCh) and the Polish Consultants Society (TKP).
Pawel Strumillo was born in Lodz, Poland, in

1959. He received the M.Sc. degree in electrical
engineering from the Technical University of Lodz
and the Ph.D. degree in technical sciences from the
University of Strathclyde, Glasgow, U.K., in 1983
and 1993, respectively.
Currently he is an Assistant Professor at Biomedical Engineering Group in Institute of Electronics, Technical University of Lodz, Poland. He has
published approximately 30 technical articles. His
research interests include biomedical signal/image
processing, and artificial neural networks theory and their applications.
Dr. Strumillo holds membership in the Polish Society of Medical Informatics (PTIM) and the Polish Neural Networks Society (TSSN).

Kernel Orthonormalization in Radial Basis Function Neuralnetwork

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kernel Orthonormalization in Radial Basis Function Neuralnetwork

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO.

Kernel Orthonormalization in Radial

function of the RBF network can be defined as the following

10459227/97$10.00 1997 IEEE

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997

problem becomes particularly acute if very complex mapping

There are different data processing tasks allocated for each

functions to the network output so that the best match to the

KAMINSKI et al.: KERNEL ORTHONORMALIZATION

The norm in the functional space

Additionally, the set of orthogonal functions is said to be

Fig. 1. The RBF network.

A minimum of this cost function can be found by calculating

In general, however, the set of functions

and it is assumed that the set of functions

are calculated from equation set (11).

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997

Fig. 2. Kernel orthonormalization procedure of the RBF network.

transfer function of the RBF network with orthonormalized

The presented RBF network orthonormalization procedure

KAMINSKI et al.: KERNEL ORTHONORMALIZATION

Additionally, for the tested benchmark, a computational cost

of the tested points is gray level coded (BLACK LOW and

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 5, SEPTEMBER 1997

the form of a phase plot in Fig. 4. The RBF network one-step

recursively (see the Appendix). Furthermore, the proposed

is the number of test points,

Further savings on the computational load of the proposed

By substituting (20) into (11) one gets

KAMINSKI et al.: KERNEL ORTHONORMALIZATION

and from (20)

in which by using (12) the inner product in the right-hand side

[12] M. J. Korenberg, Orthogonal approaches to time-series analysis and

Wladyslaw Kaminski was born in Lodz, Poland,

Pawel Strumillo was born in Lodz, Poland, in

You might also like