1999 Guntman

UNIVERSITY OF CAMBRIDGE
Numerical Analysis Reports
A radial basis function method for global

optimization
H.-M. Gutmann
DAMTP 1999/NA22
December, 1999
Department of Applied Mathematics and Theoretical Physics

Silver Street
Cambridge CB3 9EW
England
DAMTP 1999/NA22
A radial basis function method for global optimization

H.-M. Gutmann
Abstract:
We introduce a method that aims to nd the global minimum of a continuous

nonconvex function on a compact subset of IRd. It is assumed that function evaluations are expensive and that no additional information is available. Radial basis
function interpolation is used to dene a utility function. The maximizer of this
function is the next point where the objective function is evaluated. We show
that, for most types of radial basis functions that are considered in this paper,
convergence can be achieved without further assumptions on the objective function. Besides, it turns out that our method is closely related to a statistical global
optimization method, the P-algorithm. A general framework for both methods is
presented. Finally, a few numerical examples show that on the set of Dixon-Szego
test functions our method yields favourable results in comparison to other global
optimization methods.
Department of Applied Mathematics and Theoretical Physics,

University of Cambridge,
Silver Street,
Cambridge CB3 9EW,
England.
December, 1999.
1 Introduction
Global optimization has attracted a lot of attention in the last 20 years. In
many applications, the objective function is nonlinear and nonconvex. Often,
the number of local minima is large. Therefore standard nonlinear programming
methods may fail to locate the global minimum.
In the most general way, the Global Optimization Problem can be stated as
nd x 2 D such that f (x) f (x); x 2 D;
(GOP)
where D IRd is compact, and f : D ! IR is a continuous function dened

on D. Under these assumptions, (GOP) is solvable, because f attains its minimum on D.
Numerous methods to solve (GOP) have been developed (see e.g. Horst and
Pardalos [4] and Torn and Z ilinskas [19]). Stochastic methods like simulated annealing and genetic algorithms which use only function values are very popular
among users, although their rate of convergence is usually rather slow. Deterministic methods like Branch-and-Bound, however, assume that one can compute
a lower bound of f on a subset of D. This can be done, for example, when the
Lipschitz constant on f is available. The further assumptions make these methods
very powerful, but often they are not satised or it is too expensive to provide
the necessary information.
For the method investigated in this paper, we have in mind problems when
the only information available is the possibility to evaluate the objective function,
and each evaluation is very expensive. This may mean that it takes several hours
to calculate a function value. For example, a function evaluation at a point may
be done by building an experiment, by running a long computer simulation or
by using a nite element method. Therefore, as the duration of the optimization process is dominated by the function evaluations, our goal is to require as
few function evaluations as possible to nd an adequate estimate of the global
minimum.
The method is based on a general technique proposed by Jones [8]. Let A
be a linear space of functions, and assume that, for s 2 A, (s) is a measure
of the \bumpiness" of s. Now assume that we have calculated x ; : : : ; xn and
the function values f (x ); : : : ; f (xn). A target value f is chosen that can be
regarded as an estimate of the optimal value, but it might be very crude. For
each y 62 fx ; : : : ; xn g, let sy 2 A be dened by the interpolation conditions
sy (xi) = f (xi); i = 1; : : : ; n;
sy (y) = f :
(1.1)
The new point xn is chosen to be the value of y that minimizes (sy ); y 62
fx ; : : : ; xng. Thus the view is taken that the \least bumpy" of the functions sy
1
+1
H.-M. Gutmann
yields the most useful location of the new point.

We will use radial basis functions as interpolants. Their interpolation properties are very suitable. Specically, the uniqueness of an interpolant is achieved
under very mild conditions on the location of the interpolation points, and a
measure of bumpiness is also available.
Close relations can be established between our method and one from statistical
global optimization, namely the P-algorithm (Z ilinskas [22]). Although being
derived using a completely dierent approach, it is very similar to our method.
One special case of a P-algorithm, developed by Kushner [12], is even equivalent
to a special case of our radial basis function method.
Other global optimization methods based on radial basis functions have been
developed. Alotto et. al. [1] use interpolation by multiquadrics to accelerate a
simulated annealing method. Ishikawa et. al. [6], [7] employ radial basis functions
to estimate the global minimizer and run an SQP algorithm to locate it.
The properties of radial basis functions that are necessary for the description of
our method are introduced in the following section. In particular, we address the
question of interpolation and introduce a suitable measure of \bumpiness". The
global optimization method is described in detail in Section 3. Convergence of the
method is the subject of Section 4. Subsection 4.1 contains various convergence
results, and the proof of the main theorem can be found in Subsection 4.2. The
relation between our method and the P-algorithm is addressed in Section 5. The
nal section deals with search strategies, but a complete analysis is beyond the
scope of this paper.
2 Interpolation by radial basis functions and a

measure of bumpiness
Let n pairwise dierent points x ; : : : ; xn 2 IRd and data f ; : : : ; fn 2 IR be given,

where n and d are any positive integers. We seek a function s of the form
1
s(x) =
n
X
i=1
i(kx ? xi k) + p(x); x 2 IRd;
(2.1)
that interpolates the data (x ; f ); : : : ; (xn; fn). The coecients i; i = 1; : : : ; n,

are real numbers, and p is from m , the space of polynomials of degree less than
or equal to m. The norm k:k is the Euclidean norm in IRd. The following choices
of are considered:
9
(r) = r
(linear);
>
>
(r) = r
(cubic);
=
(r) = rp log r (thin plate spline);
r 0;
(2.2)
(r) = r 2+ (multiquadric); >
>
;
(r) = e? r
(Gaussian);
1
where is a prescribed positive constant. Let the matrix 2 IRnn be dened

by
()ij := (kxi ? xj k); i; j = 1; : : : ; n:
(2.3)
Further, we introduce the linear space Vm IRn containing all 2 IRn that satisfy
n
X
i=1
iq(xi ) = 0 8 q 2 m:
(2.4)
Formally, we set V? := IRn. Obviously, Vm Vm for all m ?1. Powell [15]

shows that, in the cubic and thin plate spline cases
1
+1
T > 0 8 2 V n f0g;
(2.5)
in the linear and multiquadric cases
T < 0 8 2 V n f0g;
(2.6)
T > 0 8 2 IRn n f0g:
(2.7)
and in the Gaussian case

We let m be 1 in the cubic and thin plate spline cases, 0 in the linear and
multiquadric cases and ?1 in the Gaussian case. Then the inequalities (2.5) {
(2.7) can be merged into
0
(?1)m0 T > 0 8 2 Vm0 n f0g:
(2.8)
+1
After choosing , we let m be an integer that is not less than m , and is conned
to Vm.
Let m^ be the dimension of m , let p ; : : : ; pm be a basis of this linear space,
and let P be the matrix
0
1
p (x ) pm(x )
... C
(2.9)
P := B
@ ...
A:
p (xn ) pm (xn)
0
Then it can be shown (see [15]) that the matrix
A=
P
PT 0
2 IR n
m (n+m^ )
( +^)
(2.10)
is nonsingular if and only if x ; : : : ; xn satisfy

1
q 2 m and q(xi) = 0; i = 1; : : : ; n; =) q 0:
(2.11)
H.-M. Gutmann
In the Gaussian case with m = ?1, P and condition (2.11) are omitted. Therefore
the coecients of the function s in (2.1) are dened uniquely by the system
9
s(xi) = fi; i = 1; : : : ; n >
=
n
X
:
(2.12)
ipj (xi) = 0; j = 1; : : : ; m^ >
;
i=1
Let F be the vector whose entries are the data values f ; : : : ; fn. Then the system
(2.12) becomes
P F
(2.13)
c = 0 ;
PT 0
1
where = (1; : : : ; n)T
m^
n
m
^
2 IR , c 2 IR and 0m^ is the zero in IRm^ .
The components
of c are the coecients of the polynomial p with respect to the basis p ; : : : ; pm.
The motivation for the measurement of the bumpiness of a radial basis function interpolant can be developed from the theory of natural cubic splines in one
dimension. They can be written in the form (2.1), where (r) = r ; 2 V and
p 2 . It is well known (e.g. Powell R[14]) that the interpolant s that is dened by
the system (2.12) minimizes I (g) := IR[g00(x)] dx among all functions g : IR ! IR
that satisfy the interpolation conditions g(xi) = fi; i = 1; : : : ; n, and for which
I (g) exists and is nite. Therefore I (g) is a suitable measure of bumpiness. The
second derivative s00 is piecewise linear and vanishes outside a bounded interval.
Thus one obtains by integration by parts
1
I (s) =
IR
= 12
[s00(x)] dx = 12
n
X
i=1
i
n
X
j =1
n
X
i=1
is(xi)
j jxi ? xj j + p(xi) = 12
3
n
X
i;j =1
ij ij ;
where the last equation follows from 2 V . This relation suggests that expression
(2.8) can provide a semi-inner product and a semi-norm for each in (2.2) and
m m . Also, the semi-norm will be the measure of bumpiness of a radial basis
function (2.1). A semi-inner product h:; :i satises the same properties as an inner
product, except that hs; si = 0 need not imply s = 0. Similarly, for a semi-norm
k:k, ksk = 0 does not imply s = 0.
We choose any radial basis function from (2.2) and m m , and we dene
A;m to be the linear space of all functions of the form
1
N
X
i=1
i(kx ? yik) + p(x); x 2 IRd;
where N 2 IN , y ; : : : ; yN 2 IRd, p 2 m , and = ( ; : : : ; N )T satises (2.4) for

n = N . On this space, the semi-inner product and the semi-norm are dened as
1
follows. Let s and u be any functions in A;m, i.e.
s(x) =
N (s)
i=1
i(kx ? yik) + p(x) and u(x) =
N (u)
j =1
j (kx ? zj k) + q(x):
We let the semi-inner product be the expression
hs; ui := (?1)m0 +1
N (s)
i u(yi):
i=1
(2.14)
Clearly, it is bilinear. To show symmetry, we use
N (s)
i=1
iq(yi) = 0 and
to deduce
hs; ui =
=
=
=
(?1)m0 +1
N (u)
j =1
j p(zj ) = 0;
0
1
X @NXu
i
j (kyi ? zj k) + q(yi)A
N (s)
( )
i=1
j =1
N (s) N (u)
m
0 +1
ij (kyi ? zj k)
(?1)
i=1 j =1
N (u)
N (s)
m
0 +1
(?1)
j
i(kzj ? yik) + p(zj )
j =1
i=1
N (u)
m
0 +1
(?1)
j s(zj ) = hu; si:
j =1
XX
0
X @X
1
A
By (2.8),
hs; si = (?1)m0 +1
N (s)
i=1
is(yi) = (?1)m0 +1
N (s)
i;j =1
ij (kyi ? yj k)
(2.15)
is strictly positive, if 2 Vm n f0g and m m , i.e. s 2 A;m is not a polynomial

in m . Thus (2.14) is a semi-inner product on A;m that induces the semi-norm
hs; si with null space m (for details see Powell [16] and Schaback [17]).
In analogy to the variational principle for cubic splines in one dimension,
mentioned above, there is a theorem that states that the given interpolant is
the solution to a minimization problem.
0
H.-M. Gutmann
Theorem 1 (Schaback [17]) Let be any radial basis function from (2.2), and
let m be chosen such that m m . Given are points x ; : : : ; xn in IRd having the
0
property (2.11) and values f1 ; : : : ; fn in IR. Let s be the radial function of the form
(2.1) that solves the system (2.12). Then s minimizes the semi-norm hg; gi1=2 on
the set of functions g 2 A;m that satisfy
g(xi) = fi; i = 1; : : : ; n:
(2.16)
3 A radial basis function method

It will be shown how radial basis functions can be used in the general method of
Jones [8] to solve the problem (GOP) (cf. Powell [16]). As in Section 2, we pick
from (2.2) and m m . Let p ; : : : ; pm be a basis of m , where m^ = dim m.
Assume we have chosen x ; : : : ; xn 2 D that satisfy (2.11), and we know the
function values f (x ); : : : ; f (xn). Let the function
0
sn(x) =
n
X
i=1
i(kx ? xik) + p(x); x 2 IRd;
interpolate (x ; f (x )); : : : ; (xn; f (xn)). Our task is to determine xn . For a

target value fn and a point y 2 D n fx ; : : : ; xn g the radial basis function sy that
satises (1.1) can be written as
1
+1
sy (x) = sn(x) + [fn ? sn(y)] `n(y; x); x 2 IRd;
(3.1)
where `n(y; x) is the radial basis function solution to the interpolation conditions
`n(y; xi) = 0;
`n(y; y) = 1:
i = 1; : : : ; n;
(3.2)
Therefore `n(y; :) can be expressed as
`n(y; x) =
n
X
i=1
i(y)(kx?xik)+n(y)(kx?yk)+
m
X
^
i=1
bi (y)pi(x); x 2 IRd; (3.3)
where the coecients of `n(y; :) are dened by the equations
0 (y) 1 0 0 1
n
A(y) @ n(y) A = @ 1 A :
b(y)
0m
^
(3.4)
Here (y) = ( (y); : : : ; n(y))T 2 IRn, b(y) = (b (y); : : : ; bm (y))T 2 IRm , n(y) 2
IR, 0n and 0m denote the zero in IRn and IRm, respectively, and, as in equation
(2.10), A(y) has the form
1
A(y) := P((yy))T 0P (y)

mm
Specically, letting u(y) and (y) be the vectors
^
(3.5)
u(y) := ((ky ? x k); : : : ; (ky ? xnk))T

1
and
(y) := (p (y); : : : ; pm(y))T ;

respectively, (y) and P (y) are the matrices
(y) :=
u(y)
u(y)T (0)
P
P (y) := (y)T :
and
(3.6)
The square of the semi-norm hsy ; sy i of the new interpolant (3.1), as dened
in the previous section, has the value
hsy ; sy i = hsn; sni + 2[fn ? sn(y)]hsn; `n(y; :)i + [fn ? sn(y)] h`n(y; :); `n(y; :)i:
2
Equations (2.14) and (3.2) imply
hsn; `n
(y; :)i = (?1)m0 +1
n
X
i=1
i`n(y; xi) = 0;
and, using expressions (3.2) and (3.3), we nd the expression
h`n(y; :); `n(y; :)i = (?1)m
0 +1
"X
n
i=1
i(y)`n(y; xi) + n(y)`n(y; y)
= (?1)m0 n(y):
+1
#
(3.7)
Thus we deduce the formula
hsy ; sy i = hsn; sni + (?1)m n(y) [fn ? sn(y)] :

Further, we dene the function gn : D n fx ; : : : ; xng ! IR as the dierence
gn(y) := hsy ; sy i ? hsn; sni = (?1)m n(y) [fn ? sn(y)] ;
which is nonnegative. Since hsn; sni is independent of y, the required minimization
of hsy ; sy i and the minimization of gn(y) are equivalent.
0 +1
0 +1
H.-M. Gutmann
10
The choice of fn determines the location of xn . If
fn min
s (y);
y2D n
+1
then gn(y) = 0 can be achieved. However, if

fn < min
s (y);
y2D n
then xn will be away from the xi ; i = 1; : : : ; n. In particular, for fn ! ?1, we

make the following deduction.
Remark 2 For fn < miny2D sn(y) let x(fn) be the minimizer of gn, i.e.
(?1)m0 n(x(fn))[sn(x(fn)) ? fn] (?1)m0 n(y)[sn(y) ? fn]
8 y 2 D n fx ; : : : ; xng:
This is equivalent to
s (y) ? s (x(f ))
n
n
m

m
(?1) 0 n(x(fn )) (?1) 0 n(y) 1 + n

sn(x(fn )) ? fn :
As fn ! ?1, the boundedness of sn on D implies
(?1)m0 n(x(?1)) (?1)m0 n(y) 8 y 2 D n fx ; : : : ; xn g:
+1
+1
+1
+1
+1
+1
+1
Therefore, the choice fn = ?1 requires the minimization of (?1)m0 n(y). This
process puts xn in a large gap between xi; i = 1; : : : ; n, a property that is of
fundamental importance to global optimization.
The following basic algorithm employs the given method.
+1
+1
Algorithm 3
Initial step: Pick from (2.2) and m m .
Choose points x ; : : : ; xn 2 D that satisfy (2.11). Compute the radial function sn that minimizes hs; si on A;m , subject to the interpolation conditions
0
s(xi ) = f (xi); i = 1; : : : ; n:
Iteration step: x ; : : : ; xn are the points in D where the value of f is known,

and sn minimizes hs; si, subject to s(xi ) = f (xi ); i = 1; : : : ; n.
Choose a target value fn 2 [?1; miny2D sn(y)]. (The choice fn = min sn(y)
1
is admissible only if none of the xi is a global minimizer of sn ).

Calculate xn+1 , which is the value of y that minimizes the function
gn(y) = (?1)m0 +1n(y) [sn(y) ? fn ]2 ; y 2 D n fx1; : : : ; xng:
Evaluate f at xn+1 and set n := n + 1.
Stop, if n is greater than a prescribed nmax .
(3.8)
11
The function gn is innitely dierentiable on Dnfx ; : : : ; xng, but is not dened at

the interpolation points. If fn = min sn(y); y 2 D; and if sn(xi ) > fn; i = 1; : : : ; n,
then the global minimizers of gn are the global minimizers of sn. Thus one can
minimize sn, which is dened on the whole of D, to obtain xn . If fn < min sn(y),
however, then gn(x) tends to innity as x tends to xi; i = 1; : : : ; n. Let hn : D !
IR be dened as
8 1
>
< g (y) ; y 62 fx ; : : : ; xng
hn(y) := > n
:
(3.9)
: 0; y = xi; i = 1; : : : ; n
1
+1
The maximization of hn on D is equivalent to the minimization of gn. Further,

hn is innitely dierentiable on D n fx ; : : : ; xng. It can also be shown, using the
system (3.4), that it is in C (D) in the linear case, in C (D) in the thin plate spline
case, in C (D) in the cubic case, and in C 1(D) in the multiquadric and Gaussian
cases.
Under certain conditions on f and the values fn, n ! 1, it can be proved that
a subsequence of the generated points (xn)n2IN converges to a global minimum.
This is the subject of the following section.
1
4 Convergence of the method

Our aim is to prove convergence of the method for any continuous function f . A
theorem by Torn and Zilinskas [19] tells us that the sequence that is generated by
Algorithm 3 should be dense. Applied to our method, it states
Theorem 4 The algorithm converges for every continuous function f if and only
if it generates a sequence of points that is dense in D.
So our task is to establish the density of the sequence of generated points. Our
main convergence result (Theorem 7), however, does not allow all choices of and
m.
The rst part of this section states Theorem 7 and derives a few corollaries.
The second part gives the proof of the theorem.
4.1 Convergence results
Assume that fn is set to miny2D sn(y) on each iteration and that this choice is
admissible. Then, if there are large function values in a neighbourhood U of a
global minimizer of f , the global minimizer of sn might be outside U for every
n n , where n is the number of initial points. In this case, U is not explored
and the global optimum of f might be missed. Therefore, we have to assume that
0
H.-M. Gutmann
12
enough of the numbers min sn(y) ? fn are suciently large. Specically, let > 0
and 0 be constants, where additionally < 1 in the linear case and < 2 in
the thin plate spline and cubic cases, and dene
n := min
kxn ? xi k:
in?
(4.1)
min
s (y) ? fn > =
n ksnk1 ;
y2D n
(4.2)
Then the condition
for innitely many n 2 IN , where k:k1 denotes the maximum norm of a function
on D, will lead to the required result. We note that the norms ksnk1 may diverge
as n ! 1.
Unfortunately, the choice of and m is restricted. In the proof of Theorem 7 we
need the result that, for any y 2 D and any neighbourhood U of y, (?1)m0 n(y)
can be bounded above by a number that does not depend on n, if none of the
points x ; : : : ; xn is in U . This condition is achieved, if there is a function that
takes the value 1 at y, that is identically zero outside U , and that is in the function
space N;m(IRd) as dened below.
+1
Denition 5 Let from (2.2) and m m be given. A continuous function

F : D ! IR, D IRd , belongs to the function space N;m(D), if there exists a
positive constant C such that, for any choice of interpolation points x ; : : : ; xn 2 D
for which (2.11) holds, the interpolant sn 2 A;m to F at these points has the
0
property
hsn; sni C:
The characterization of N;m(D) is rather abstract. In the linear, cubic and

thin plate spline cases, the following proposition that is taken from Gutmann [3]
provides a useful criterion to check whether it is satised. In the multiquadric
and Gaussian cases, however, no such criterion is known.
Proposition 6 Let (r) = r, (r) = r log r or (r) = r . Further, let = 1 in

2
the linear case, = 2 in the thin plate spline case and = 3 in the cubic case, and
choose the integer m such that 0 m d in the linear case, 1 m d +1 in the
thin plate spline case and 1 m d + 2 in the cubic case. Dene := (d + )=2
if d + is even, and := (d + +1)=2 otherwise. If F 2 C (IRd ) and has bounded
support, then F 2 N;m(IRd ).
Global convergence will be established only for the cases covered by this proposition. It remains an open problem whether a similar property is achieved in other
cases. Thus we have the following theorem.
13
Theorem 7 Let (r) = r, (r) = r log r or (r) = r . Further, choose the

integer m such that 0 m d in the linear case, 1 m d + 1 in the thin
plate spline case and 1 m d + 2 in the cubic case. Let (xn )n2IN be the
2
sequence generated by Algorithm 3, and sn be the radial function that interpolates

(xi; f (xi)); i = 1; : : : ; n. Assume that, for innitely many n 2 IN , the choice of fn
satises (4.2). Then the sequence (xn ) is dense in D.
A particular convergence result follows immediately from Theorems 4 and 7, because the right hand side of (4.2) is some real number.
Corollary 8 Let the assumptions of Theorem 7 on and m hold. Further, let f

be continuous, and, for innitely many n 2 IN , let fn = ?1. Then the method
converges.
An interesting question is to nd conditions on f such that the maximum norm

of an interpolant is uniformly bounded. If they hold, then the right-hand side of
(4.2) can be replaced by n= , so this constraint on fn can be checked easily. We
consider the special case of linear splines in one dimension, when d = 1, (r) = r
and m = 0. For arbitrary points x ; : : : ; xn, the piecewise linear interpolant sn
attains its maximum and minimum values at interpolation points. Thus ksnk1 is
bounded by kf k1, a number that does not depend on the interpolation points.
Therefore in this case the term ksnk1 may be dropped from (4.2).
For other radial basis functions and other dimensions this simplication may
fail. It is shown in the next lemma, however, that the uniform boundedness of
the semi-norm of an interpolant is sucient for the uniform boundedness of the
maximum norm. Thus, the second convergence result applies to functions f in
N;m(D).
2
Lemma 9 Let f be in N;m(D). Further, let (xn )n2IN be a sequence in D with

pairwise dierent elements, such that (2.11) holds for n = n . For n n , denote
the radial basis function interpolant to f at x ; : : : ; xn by sn. Then ksnk1 is
0
uniformly bounded by a number that only depends on x1 ; : : : ; xn0 .
Proof:
We x n, and we let y be any point of D n fx1; : : : ; xng. Let s~n be the radial
function that interpolates (y; f (y)) and (xi ; f (xi)); i = 1; : : : ; n. By analogy with
equation (3.1), it can be written as
s~n(x) = sn(x) + [f (y) ? sn(y)]`n(y; x); x 2 IRd;

where `n(y; :) is still the cardinal function that interpolates (xi ; 0); i = 1; : : : ; n,
and (y; 1). Thus, as shown in Section 3,
hs~n; s~ni = hsn; sni + [f (y) ? sn(y)] (?1)m0 n(y);
2
+1
H.-M. Gutmann
14
which gives the equation
[f (y) ? sn(y)] =
2
hs~n; s~ni ? hsn; sni ;

(?1)m n(y)
(4.3)
0 +1
the value of (?1)m0 n(y) being strictly positive.

Next we show that (?1)m0 n(y) is bounded away from zero. Let `n0 (y; :)
be the cardinal function that interpolates (x ; 0); : : : ; (xn0 ; 0) and (y; 1). Then the
semi-norm properties of h:; :i and Theorem 1 imply
+1
+1
0 < (?1)m0 n0 (y) = h`n0 (y; :); `n0 (y; :)i
h`n(y; :); `n(y; :)i = (?1)m0 n(y):
+1
+1
For n = n , let A and A(y) be the matrices (2.10) and (3.5), respectively. By
using Cramer's Rule to solve (3.4), we nd
det A :
n0 (y) = det
A(y)
0
Now det A is a nonzero constant, and det A(y) is bounded on D. It follows that
(?1)m0 n0 (y) is bounded away from zero. Therefore there exists a constant
> 0 such that
+1
(?1)m0 n(y) 8 y 2 D n fx ; : : : ; xn0 g; n n :

+1
(4.4)
As f 2 N;m(D), hs~n; s~ni is bounded above by a positive constant C . Further,

hsn; sni is nonnegative. It follows from (4.3) and (4.4) that
jf (y) ? sn(y)j C ; y 2 D n fx ; : : : ; xng:

1
Further, because f is bounded on D, we obtain
jsn(y)j C + kf k1:
Note that the right-hand side is independent of n and y, as required. Alternatively,

if y 2 fx ; : : : ; xng, we have
1
jsn(y)j = jf (y)j kf k1;

which completes the proof.
Next, by applying Proposition 6, we obtain a criterion that ensures that f is

in N;m(D).
15
Proposition 10 Let , m and be dened as in Proposition 6, and let f 2

C (D), where D IRd is compact. Then f 2 N;m(D).
Proof:
By Whitney's theorem ([20]), f can be extended to a function F 2 C (IRd)
that is equal to f on D. Now D is contained in a closed ball of radius , say,
and there is an innitely dierentiable function g with g(x) = 1; kxk , and
g(x) = 0; kxk 2. Thus F g is in C (IRd), and by Proposition 6 it is in
N;m(IRd ). Since F g is equal to f on D, it follows from the denition of the
semi-norm that f 2 N;m(D).
2
We complete this subsection by combining Theorem 7, Lemma 9 and Proposition 10.
Corollary 11 Let the assumptions of Theorem 7 on and m hold. Let be as

in Proposition 6, and let f 2 C (D). Further, assume that, for innitely many
n 2 IN , fn has the property
min
s (y) ? fn n= ;
y2D n
2
where > 0 is a constant, and where n and are as in the rst paragraph of
this subsection. Then the method converges.
4.2 Proof of Theorem 7

In order to establish Theorem 7, some lemmas are needed on the behaviour of the
coecients n.
Lemma 12 Let be any of the radial basis functions in (2.2), and let m and
m m be chosen as in Section 2. Let D IRd be compact, and let (xn)n2IN be
a convergent sequence in D with pairwise dierent elements. Further, let (yn)n2IN
be a sequence in D such that yn =
6 xn; n 2 IN , and limn!1 kxn ? ynk = 0: Choose
k points z ; : : : ; zk 2 D that satisfy condition (2.11). Assume (xn) converges to
x 2 Dnfz ; : : : ; zk g. For any y 2 Dnfz ; : : : ; zk ; yn g, let `~y be the cardinal spline
0
+1
that interpolates the data (z1 ; 0); : : : ; (zk ; 0); (yn+1; 0) and (y; 1), and let ~n(y) be
the coecient of `~y that satises (?1)m0 +1~n (y) = h`~y ; `~y i. Then, for 0 < 1
in the linear case and 0 < 2 in the other cases,
lim (?1)m0 kyn ? xn k ~n(xn ) = 1:
n!1
+1
+1
+1
+1
(4.5)
H.-M. Gutmann
16
Proof:
Let An and An(xn+1 ) be the matrices of the form (2.10) for the interpolation
points z1 ; : : : ; zk ; yn+1 and z1 ; : : : ; zk ; yn+1; xn+1, respectively. For suciently large
n, neither xn+1 nor yn+1 is in the set fz1 ; : : : ; zk g. Thus, An and An(xn+1) are
nonsingular. Cramer's Rule implies
An :
~n(xn ) = detdet
An(xn )
+1
(4.6)
+1
Also, let A be the matrix of the form (2.10) for the interpolation points z ; : : : ; zk ; x .
By the continuity of the determinant,
1
lim det An = det A 6= 0:
(4.7)
n!1
In order to investigate the behaviour of kyn ? xn k? det An(xn ), we let

+1
+1
+1
T
v(y) := (ky ? z k); : : : ; (ky ? zk k) ; y 2 D;

1
and
T
p(y) := p (y); : : : ; pm(y) ; y 2 D;

1
where m^ = dim m and p ; : : : ; pm are as in Section 2. Thus An(xn ) is the

matrix
1
0
BB v(yn
BB
@ v(xn
+1
v(yn )
v(xn )
P
)T
(0)
(kyn ? xn k) p(yn )T
)T (kyn ? xn k)
(0)
p(xn )T
PT
p(yn )
p(xn )
0
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
1
CC
CC ;
A
(4.8)
+1
where and P correspond to (2.3) and (2.9), respectively, if we set n = k and

fx ; : : : ; xng = fz ; : : : ; zk g. Now the rows
1
( v(yn )T
(0)
(kyn ? xn k) p(yn )T )
and ( v(xn )T (kyn ? xn k)
(0)
p(xn )T )
+1
+1
+1
+1
+1
+1
+1
+1
have the same limit as n ! 1, so det An(xn ) tends to zero. Hence the properties
(4.6) and (4.7) prove the assertion (4.5) for = 0.
For > 0, note that the value of the determinant of the matrix (4.8) does
not change if we replace the second row by the dierence between the second and
third rows, and subsequently replace the second column by the dierence between
+1
17
the second and third columns. Then det An(xn ) becomes

+1

v
(
y
)
?
v
(
x
)
v
(
x
)
P
n
n
n

T
T
v(yn )

(kyn ? xn k) p(yn )
?v(xn )T 2[(0) ? (kyn ? xn k)]
?p(xn )T :
?(0)
v(xn )T (kyn ? xn k) ? (0)
(0)
p(xn )T

PT
p(yn ) ? p(xn )
p(xn )
0
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
(4.9)
We have to divide the determinant by kyn ? xn k, so we divide the second
row and then the second column of (4.9) by kyn ? xn k= . Then the following
remarks are helpful.
For each choice of and each j = 1; : : : ; k, the function (kzj ? xk); x 2 D, is
Lipschitz continuous, so the components of v(yn ) ? v(xn ) satisfy
+1
+1
+1
+1
+1
+1
j(kzj ? yn k) ? (kzj ? xn k)j const kxn ? yn k; j = 1; : : : ; k:

+1
+1
+1
+1
Thus for < 2 we have

h
i
1
(kzj ? yn k) ? (kzj ? xn k) = 0;
lim
n!1 kyn ? xn k=
+1
+1
+1
+1
(4.10)
Similarly, for < 2, the components of p(yn ) ? p(xn ) have the property
h
i
1
lim
p (y ) ? pi(xn ) = 0
(4.11)
n!1 kyn ? xn k= i n
Finally, we deduce
h
i
1
(kyn ? xn k) ? (0) = 0;
(4.12)
lim
n!1 kyn ? xn k
for < 1 in the linear case, for < 2 in the thin plate spline, multiquadric and
Gaussian cases, and for < 3 in the cubic case. This is clear in the linear, thin
plate spline and cubic cases. In the other two cases it follows from second order
Taylor expansion of , because 0(0) = 0 and 00 is bounded on IR . Thus (4.9) {
(4.12) provide
lim ky ? xn k? det An(xn ) = 0;
n!1 n
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
for the given values of . Hence (4.6) and (4.7) imply (4.5).
Now we obtain
Lemma 13 Let , m and m be chosen as in Lemma 12, and let (xn)n2IN be the
sequence generated by Algorithm 3. Further, let 0 < 1 in the linear case and
0
H.-M. Gutmann
18
0 < 2 in the other cases. Then, for every convergent subsequence (xnk )k2IN of
(xn),
m0
lim
(
?
1)
nk nk ? (xnk ) = 1;
k!1
+1
where n(:) is dened in Section 3 and nk is expression (4.1) for n = nk .
Proof:
For n 2, dene jn to be the natural number j that minimizes kxn ? xj k; j < n,
so n = kxn ? xjn k. Further, let (yn)n2IN be the sequence
x ; n = 1;
yn :=
2
xjn ; n 2:
Let (xnk )k2IN be a subsequence of (xn)n2IN , that converges to x , say. Convergence

and the choice of (yn)n2IN imply limk!1 kxnk ? ynk k = 0.
The initial step of Algorithm 3 provides a nite number of points that satisfy
(2.11), so the initial interpolation matrix (2.10) is nonsingular. If one of these
points is x , we pick xnk0 in a neighbourhood of x so that the interpolation
matrix to xnk0 and the other initial points is also nonsingular. Therefore there exist
points x^ ; : : : ; x^l in (xn)n2IN such that their interpolation matrix is nonsingular,
and x 62 fx^ ; : : : ; x^l g.
For suciently large k 2 IN , such that ynk 62 fx^ ; : : : ; x^l g, and for any
y 2 D n fx ; : : : ; xnk ? g, we let `^k (y; :) be the radial function that interpolates
(^x ; 0); : : : ; (^xl ; 0); (ynk ; 0) and (y; 1), and we let `nk ? (y; :) be the interpolant
to (x ; 0); : : : ; (xnk ? ; 0) and (y; 1). Because `nk ? (y; :) interpolates (ynk ; 0) and
(^xi; 0); i = 1; : : : ; l, for suciently large k, (3.7) and Theorem 1 imply the inequality
1
(?1)m0 ^k (y) = h`^k (y; :); `^k (y; :)i

h`nk? (y; :); `nk? (y; :)i = (?1)m0 nk? (y)
+1
+1
(4.13)
for the coecients ^k and nk? .

We apply Lemma 12 in the case when fz ; : : : ; zk g is the set fx^ ; : : : ; x^l g and
n = nk ? 1. It follows that
1
(?1)m0 kxnk ? ynk k^k (xnk ) = 1;

lim (?1)m0 nk ^k (xnk ) = klim
!1
k!1
+1
+1
with the choice of stated in Lemma 12. Thus, setting y = xnk in (4.13), we
obtain that (?1)m0 nk nk ? (xnk ) tends to innity as k ! 1.
2
+1
Finally we show, using Proposition 6, that the coecients n(y) are uniformly
bounded, if y is bounded away from the points that are generated by the algorithm.
19
Lemma 14 Let (r) = r, (r) = r log r or (r) = r . Further, choose the

integer m such that 0 m d in the linear case, 1 m d + 1 in the thin plate
spline case and 1 m d + 2 in the cubic case. Let (xn)n2IN be the sequence
2
generated by Algorithm 3, and let n0 be the number of points chosen in the initial
step. Assume that there exist y0 2 D and a neighbourhood N := fx 2 IRd :
kx ? y0k < g, > 0, that does not contain any point of the sequence. Then there
exists K > 0, that depends only on y0 and , such that
(?1)m0 n(y ) K 8 n n :
+1
Proof:
For any n n0 , let `n be the radial function that is dened by `n(xi) = 0; i =
1; : : : ; n, and `n(y0) = 1. There exists a compactly supported innitely dierentiable function F that takes the value 1 at y0 and 0 on IRd n N . It follows from
Proposition 6 that F 2 N;m. `n interpolates F at x1 ; : : : ; xn and y0. Therefore,
there is a positive number K , depending on y0 and , such that
(?1)m0 n(y ) = h`n; `ni K; n n :

+1
Now we are ready to prove Theorem 7.

Proof of Theorem 7:
Assume there is y0 2 D and an open neighbourhood U = fx 2 IRd : kx ? y0k < g,
> 0, that does not contain an interpolation point. The iteration step of Algorithm 3 gives
gn(xn+1 ) gn(y0); n n0 ;
where n0 is the number of points chosen in the initial step of the algorithm.
By assumption (4.2), there is a subsequence (nk )k2IN of the natural numbers such
that
2
min
s (y) ? fnk ?1 > =
(4.14)
nk ?1 ksnk ?1 k1 0; k 2 IN;
y2D nk ?1
with > 0, nk ? being the expression (4.1) for n = nk ? 1, 0 < 1 in the linear
and 0 < 2 in the thin plate spline and cubic cases. The sequence (xnk )k2IN is a
sequence in a compact set, thus it contains a convergent subsequence. Therefore,
without loss of generality, we assume that (xnk )k2IN itself converges.
For all k 2 IN , xnk is the minimizer of gnk ? (x). Thus, if fnk ? > ?1,
1
(?1)m0 nk? (xnk )[snk ? (xnk ) ? fnk ? ]

(?1)m0 nk ? (y )[snk ? (y ) ? fnk ? ] :
+1
+1
(4.15)
H.-M. Gutmann
20
If ksnk ? k1 > 0, this inequality, condition (4.14) and the denition of k:k1 provide
1
(?1)m0 +1
m +1
nk ?1 (xnk ) (?1) 0 nk ?1 (y0 )
nk ?1
snk ?1 (xnk ) ? fnk ?1
k 1

j
s
nk ? (y ) ? snk ? (xnk )j
nk ? (y ) 1 +
snk ? (xnk ) ? fnk ?
#
"
1 jsnk ? (y ) ? snk ? (xnk )j
nk ? (y ) 1 + =
ksnk ? k1
nk
(?1)m0 +1
(?1)m0 +1
sn ? (y ) ? f
"
(?1)m nk ? (y ) 1 + 2=

0 +1
nk
If ksnk ? k1 = 0, snk ? (y) ? fnk ? is a positive number independent of y, thus

(4.15) gives
1
(?1)m0 nk ? (xnk ) (?1)m0 nk? (y )

+1
+1
(?1)m0 +1
"
nk? (y ) 1 + 2=
nk
1
for any positive , as before. Remark 2 shows that this inequality holds also in
the case fnk ? = ?1. Multiplying both sides by nk yields
1
nk (?1)m0 +1nk ?1(xnk ) (?1)m0 +1nk?1 (y0)
2
=
nk +
2 :

2
(4.16)
By Lemma 13, the left-hand side of (4.16) tends to innity as k tends to innity.
However, Lemma 14 states that (?1)m0 n(y ) is bounded above by a constant
that does not depend on n. Thus the right-hand side of (4.16) is bounded by a
constant that is independent of k which contradicts (4.16). Therefore there is a
point in the sequence that is an element of U . This implies that in each neighbourhood of an arbitrary y 2 D there are innitely many elements of (xn)n2IN ,
so the sequence is dense in D.
2
+1
5 Relations to statistical global optimization

In this section we consider the similarities between the given radial basis function
method and the P-algorithm. The idea of that method is proposed by Kushner
[11], [12] for one-dimensional problems. Here the objective function is regarded as
a realization of a Brownian motion stochastic process. If real numbers x <: : :<xn
1
21
are given, and their function values f (x ); : : : ; f (xn) have been calculated, the
model yields, for each x in the feasible set, a mean value Mean(x) and a variance
Var(x). Mean(x) serves as a prediction of the true function value at x, while
Var(x) is a measure of uncertainty. It turns out that Mean is the piecewise linear
interpolant of the given data. The variance is piecewise quadratic on [x ; xn],
nonnegative and takes the value zero at x ; : : : ; xn. For a real number x, let Fx
be the normally distributed random variable Fx with mean Mean(x) and variance
Var(x). Now a nonnegative n is chosen, and the next point xn will be the one
that maximizes the utility function
1
+1
P Fx i min
f (xi) ? n ; x 2 D;
;:::;n
(5.1)
=1
where P denotes probability. One can show that maximizing (5.1) is equivalent
to maximizing
Var(x)
; x 2 D:
(5.2)
[Mean(x) ? minff (x ); : : : ; f (xn)g + n]
2
Compare our method in one dimension with the choice of linear splines, i.e. (r) =
r and m = 0, and with the target values fn = minff (x ); : : : ; f (xn)g ? n. In
this case, the interpolant sn is identical to Mean. Further, except for a constant
factor,
1
Var(x) = ?
n(x) :
Therefore, Kushner's method and our method using linear splines are equivalent.
Z ilinskas [21], [22] extends this approach to Gaussian random processes in
several dimensions. He uses the selection rule (5.1) and introduces the name \Palgorithm". In addition, he gives an axiomatic description of the terms involved
in (5.1), namely the mean value function, the variance function and the utility
function. We relate these results to our method.
Consider a symmetric function : IRd IRd ! IR; (x; z) 7! (x; z), and
assume that is conditionally positive or negative denite of order m. This
means, there exists 2 f0; 1g such that, given n dierent points x ; : : : ; xn 2 IRd
and multipliers ; : : : ; n 2 IR, we have
1
(?1)
n
X
i;j =1
ij (xi ; xj ) > 0;
if the i; i = 1; : : : ; n; are not all zero and satisfy

n
X
i=1
ip(xi ) = 0; p 2 m:
H.-M. Gutmann
22
Denote the matrix with the elements (xi; xj ); i; j = 1; : : : ; n, by , and the

matrix with the elements pj (xi ); i = 1; : : : ; n; j = 1; : : : ; m^ by P , where fpj : j =
1; : : : ; m^ g is a basis of m and m^ its dimension. The analogue of expression (2.10)
is the matrix
P
A = PT 0 :
(5.3)
Further, we now let the interpolant s to the components of F = (f (x ); : : : ; f (xn))T
be the function
1
s(y) =
n
X
i=1
i(y; xi) +
m
X
^
j =1
cj pj (y);
whose coecients solve the system
F
A c = 0 ;
m
^
It can be written as
s(y) = vm
(y)T A?1
F
0m ; y 2 D ;
^
(5.4)
where vm (y) is the vector
vm(y) := ((y; x ); : : : ; (y; xn); p (y); : : : ; pm(y))T :

1
(5.5)
The nonnegative function

Var(y) = j(y; y) ? vm (y)T A? vm (y)j; y 2 D;
1
(5.6)
is assumed to be a measure of uncertainty. Note that vm(xi ) is the i-th column of

A, i = 1; : : : ; n, so we obtain
Var(xi ) = j(xi; xi) ? vm (xi)T A? vm(xi )j
= j(xi; xi) ? vm (xi)T eij
= 0:
1
Thus there is no uncertainty at the interpolation points, which is meaningful

because we know the true function values there.
For the P-algorithm, is interpreted as the correlation function of a Gaussian
stochastic
process. The use of a normal distribution, for example, gives (x; y) :=
2=
?k
x
?
y
k
e
, but other choices of are also considered in the literature. All of them
are positive denite, so we set m = ?1. The conditional mean and the conditional
2
23
variance can be expressed as ([21])

Mean(y) =
Var(y) =
1
0
(y; x )
f (x ); : : : ; f (xn) ? B
@ ... CA ;
(y; xn)
0
1

(
y;
x
)

(y; y) ? (y; x ); : : : ; (y; xn) ? B
@ ... CA ;

(y; xn)
(5.7)
(5.8)
which agree with (5.4) and (5.6).

For our method, given and m, it is suitable to dene (x; y) := (kx ? yk).
Thus = , and the coecients of the interpolant s solve (2.13). This gives the
form
F
T
?
(5.9)
s(y) = vm (y) A
0 ;
1
which is the same as the function (5.4).

We have seen already that j1=nj can be regarded as a variance in the case of
linear splines in one dimension. An expression for it containing the matrix A and
the vector vm(x) can be derived in the following way. For any y 2 Dnfx ; : : : ; xng,
consider the cardinal function (3.3). The second cardinality condition from (3.2)
implies
n
m
i (y) (ky ? x k) + X
bj (y) p (y):
1 = (0) + X
(5.10)
i
j
n(y)
i n(y )
j n(y )
1
=1
=1
The coecients (y); n(y) and b(y) solve the system (3.4). Moreover, the vector
(5.5) contains the rst n and the last m^ elements of the (n +1)-th column of A(y).
Therefore (y) and b(y) also solve
A
which implies
(y)
b(y)
= ?n(y)vm(y);
1
(y) = ?A? v (y):
m
n(y) b(y)
Thus, replacing the terms i(y)=n(y) and bj (y)=n(y) in (5.10), we nd
1
1 = (0) ? v (y)T A? v (y):

m
m
n(y)
1
(5.11)
It follows from (x; x) = (0), x 2 D, that expression (5.6) is equivalent to

j1=n(y)j.
H.-M. Gutmann
24
Finally, we consider the selection rule for the next point. For the P-algorithm,
it has already been noted that the maximization of (5.1) is equivalent to the
maximization of (5.2). For a given target value f dene the function U : IR ! IR
as
(5.12)
U (M; V ) := (M V? f ) :
It is increasing in V and decreasing in M , if f M . Also, it satises the axioms
of rational search stated in Z ilinskas [22]. Then, employing (5.4) and (5.6), both
methods choose the point that maximizes
2
U s(y); Var(y) ; y 2 D:
6 Search strategies and practical questions

Practical features of our method have received little attention in this paper. Several questions arise concerning the choice of parameters in Algorithm 3.
1. What radial basis function should be chosen, and what polynomial degree
m?
2. What is a good strategy for the choice of the target values fn?
3. Given a target value, how should the minimization of gn in (3.8) (or the maximization of hn in (3.9)) be carried out? Should we approximate the global
optimum of gn (or hn ) or compute a (possibly non-global) local minimum?
The rst problem has not been investigated thoroughly. Experiments using cubic
and thin plate splines on a few test functions suggest that one cannot say in
general that one of them is better than the other. Experiments have not been
tried yet for the other types.
The choice of target values is crucial for the performance of the method. The
interpretation of the two extremal cases has been noted in Section 3. Specically,
the choice fn = miny2D sn(y) means that we trust our model and assume that
the minimizer of sn is close to the global minimizer of f . In the other case,
namely fn = ?1, we try to nd a point in a region that has not been explored
at all. It may be best to employ a mix between values of fn that are suitable
for convergence to a local minimizer and values that provide points in previously
unexplored regions of the domain.
Research is going on for the third question. We prefer to maximize hn , as
this function is dened everywhere on D. It might seem strange that we consider
computing the global optimum of hn, i.e. that a global optimization problem is
replaced by another one. However, unlike f , hn can be evaluated quickly, and
derivatives are available as well. Also we know roughly where the local maxima

function
dimension
Branin
2
2
Goldstein-Price
Hartman 3
3
4
Shekel 5
4
Shekel 7
Shekel 10
4
6
Hartman 6
25
no. of local no. of global

minima
minima
domain
3
3
[?5; 10] [0; 15]
4
1
[?2; 2]
4
1
[0; 1]
5
1
[0; 10]
7
1
[0; 10]
10
1
[0; 10]
4
1
[0; 1]
2
4
4
4
Table 1: Dixon-Szego test functions and their dimension, the domain and the
number of local and global minima.
of hn lie. Thus the maximization of hn is much easier than the minimization of
f . In addition, as the problem (GOP) is very dicult under our assumptions, it
would take too long to compute a global minimizer accurately. Therefore, from a
practical point of view, we are interested in an approximate solution of (GOP). So
it should suce to determine an approximation to the maximizer of (3.9). As far
as the second option in question 3 is concerned, we have to nd a way to choose
starting points or search regions in order to ensure fast convergence, which is still
an open problem.
Experiments show that large dierences between function values can cause the
interpolant to oscillate very strongly. Thus its minimal value can be much below
the least calculated function value. We have found in numerical computations
that these ineciencies are reduced if large function values are replaced by the
median of all available function values.
Some experiments were performed using the test functions proposed by Dixon
and Szego [2]. Table 1 gives the name of each function, the dimension, the domain
and the number of local and global minima in that domain. The maximization of
(3.9) was carried out using a version of the tunneling method (Levy and Montalvo
[13]).
The target values fn are determined as follows. The idea is to perform cycles
of N + 1 iterations for some N 2 IN , where each cycle employs a range of target
values, starting with a low one (global search), and ending with a value of fn
that is close to min sn(y) (local search). Then we go back to a global search,
starting the cycle again. The results that we report have been obtained using
the following strategy. We choose the cycle length N = 5. Let the number of
initial points be n , let the cycle start at n = n~ , and let the function values be
0
H.-M. Gutmann
26
ordered, i.e. f (x ) : : : f (xn). If f (x ) = f (xn), the interpolant is a constant

function, because we pick m 0, so the maximization of (3.9) is equivalent to
the maximization of j1=nj, if fn < f (x ). In this case, we choose fn = ?1.
Otherwise, for n~ n n~ + N ? 1, we set
1
f
n
= min
s (y) ?
y2D n
N ? n + n~
2
f (x n ) ? min
s (y) ;
y2D n
( )
n ? n
where (~n) = n~ and (n) = (n ? 1) ?

N , n~ + 1 n n~ + N ? 1. When
n = n~ + N we set fn = min sn(y). However, we have to be careful here since this
choice is only admissible if x is not one of the global minimizers of sn. Thus, we
do not accept this choice, if (f (x ) ? min sn(y)) < 10? jf (x )j, provided f (x ) is
nonzero, or if f (x ) ? miny2D sn(y) < 10? , if f (x ) = 0. In these cases, we set
fn = min sn(y) ? 10? jf (x )j and fn = min sn(y) ? 10? , respectively, to try to
obtain a yet lower function value.
We use thin plate splines as radial basis function, except for the Hartman 3
function where we take cubic splines. The algorithm is stopped when the relative
error jf ? f j=jf j becomes smaller than a xed , where f is the global
optimum and f the current best function value. The optimal values of all test
functions in Table 1 are nonzero, so this stopping criterion is valid.
Table 2 reports the number of function evaluations needed to achieve a relative
error less than 1% and 0:01%. RBF denotes our method using the target value
strategy described above. DIRECT (Jones, Perttunen and Stuckman [9]) and
MCS (Multilevel Coordinate Search, Huyer and Neumaier [5]) are recent methods
that, according to the results presented in those papers, are more ecient than
most of their competitors on the Dixon-Szego testbed. DE (Dierential Evolution,
Storn and Price [18]) is an evolutionary method that operates only at the global
level, which explains the large number of function evaluations. EGO (Ecient
Global Optimization, Jones, Schonlau and Welch [10]) is the latest method known
to us. Unfortunately, no tests are reported on the Shekel test functions. All the
results from these methods are quoted from the papers mentioned above. For the
DIRECT method, numbers of evaluations for both the 1% and the 0:01% stopping
criterion are reported. For DE and EGO only results for the 1% criterion are
available, whereas MCS only uses the 0:01% criterion. It should be noted that
MCS uses a local search method at some stages of the algorithm, and in all the
cases of Table 2 the rst local minimum found is the global one.
0
best
best
7 Conclusions
Our global optimization method converges to the global minimizer of an arbitrary
continuous function f , if we choose the sequence of target values carefully. If f is
27
error < 1%
Test function
error< 0:01%
RBF DIRECT DE EGO RBF DIRECT MCS
Branin
44
Goldstein-Price 63
25
Hartman 3
Shekel 5
76
Shekel 7
76
Shekel 10
51
112
Hartman 6
63
101
83
103
97
97
213
1190 28
1018 32
476 35
6400 {
6194 {
6251 {
7220 121
64
76
79
100
125
112
160
195
191
199
155
145
145
571
41
81
79
83
129
103
111
Table 2: Number of function evaluations for our method in comparison to DIRECT, DE, EGO and MCS with two dierent stopping criteria.
suciently smooth, there is even a suitable condition on this sequence that can be
checked by the algorithm. However, it is unsatisfactory that the multiquadric and
Gaussian cases are excluded from the statement of Theorem 7. It is believed that
the convergence result is true also in these cases, although they are not covered
by the analysis in [3].
Table 2 shows that the method is able to compete with other global optimization methods on the set of the Dixon-Szego test functions. The test functions in
this testbed, however, are of relatively low dimension, and the number of local
and global minima is very small. Therefore, it is necessary to test the method on
other sets of test functions and of course on real-world applications.
The relation to the P-algorithm is very interesting. It is hoped that the connections can be exploited further. In particular, the choice of the target values
and the determination of the point of highest utility are common problems. Solutions to these problems may be developed that are useful for both methods.
Acknowledgement I am very grateful to my supervisor, Prof. M.J.D. Powell,

for his constant help and his guidance. Also, I would like to thank the German
Academic Exchange Service for supporting this research with a doctoral scholarship (HSP III) and the Engineering and Physical Sciences Research Council for
further support with a research grant.
28
H.-M. Gutmann
References
[1] P. Alotto, A. Caiti, G. Molinari, and M.Repetto. A Multiquadrics-based
Algorithm for the Acceleration of Simulated Annealing Optimization Procedures. IEEE Transactions on Magnetics, 32(3):1198{1201, 1996.
[2] L.C.W. Dixon and G.P. Szego. The Global Optimization Problem: An Introduction. In L.C.W. Dixon and G.P. Szego, editors, Towards Global Optimization 2, pages 1{15. North-Holland, Amsterdam, 1978.
[3] H.-M. Gutmann. On the semi-norm of radial basis function interpolants. In
preparation.
[4] R. Horst and P.M. Pardalos. Handbook of Global Optimization. Kluwer,
Dordrecht, 1994.
[5] W. Huyer and A. Neumaier. Global optimization by multilevel coordinate
search. Journal of Global Optimization, 14(4):331{355, 1999.
[6] T. Ishikawa and M. Matsunami. An Optimization Method Based on Radial
Basis Functions. IEEE Transactions on Magnetics, 33(2):1868{1871, 1997.
[7] T. Ishikawa, Y. Tsukui, and M. Matsunami. A Combines Method for the
Global Optimization Using Radial Basis Function and Deterministic Approach. IEEE Transactions on Magnetics, 35(3):1730{1733, 1999.
[8] D.R. Jones. Global optimization with response surfaces. Presented at the
Fifth SIAM Conference on Optimization, Victoria, Canada, 1996.
[9] D.R. Jones, C.D. Perttunen, and B.E. Stuckman. Lipschitz Optimization
Without the Lipschitz Constant. Journal of Optimization Theory and Applications, 78(1):157{181, 1993.
[10] D.R. Jones, M. Schonlau, and W.J. Welch. Ecient Global Optimization of
Expensive Black-Box Functions. Journal of Global Optimization, 13(4):455{
492, 1998.
[11] H.J. Kushner. A Versatile Model of a Function of Unknown and Time Varying
Form. Journal of Mathematical Analysis and Applications, 5:150{167, 1962.
[12] H.J. Kushner. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise. Journal of Basic Engineering, 86:97{106, 1964.
[13] A.V. Levy and A. Montalvo. The Tunneling Algorithm for the Global Minimization of Functions. SIAM Journal on Scientic and Statistical Computing,
6(1):15{29, 1985.
29
[14] M.J.D. Powell. Approximation Theory and Methods. Cambridge University

Press, 1981.
[15] M.J.D. Powell. The Theory of Radial Basis Function Approximation in 1990.
In W.A. Light, editor, Advances in Numerical Analysis, Volume 2: Wavelets,
Subdivision Algorithms and Radial Basis Functions, pages 105{210. Oxford
University Press, 1992.
[16] M.J.D. Powell. Recent research at Cambridge on radial basis functions. In
M.W. Muller, M.D. Buhmann, D.H. Mache, and M. Felten, editors, New
Developments in Approximation Theory, International Series of Numerical
Mathematics, Vol. 132, pages 215{232. Birkhauser Verlag, Basel, 1999.
[17] R. Schaback. Comparison of radial basis function interpolants. In K. Jetter and F. Utreras, editors, Multivariate Approximations: From CAGD to
Wavelets, pages 293{305. World Scientic, Singapore, 1993.
[18] R. Storn and K. Price. Dierential Evolution - A Simple and Ecient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization, 11(4):341{359, 1997.
[19] A. Torn and A. Z ilinskas. Global Optimization. Springer, Berlin, 1987.
[20] H. Whitney. Analytic extension of dierentiable functions dened in closed
sets. Transactions of the American Mathematical Society, 36:63{89, 1934.
[21] A. Z ilinskas. Axiomatic Approach to Statistical Models and their Use in
Multimodal Optimization Theory. Mathematical Programming, 22(1):104{
116, 1982.
[22] A. Z ilinskas. Axiomatic Characterization of a Global Optimization Algorithm and Investigation of its Search Strategy. Operations Research Letters,
4(1):35{39, 1985.

1999 Guntman

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1999 Guntman

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF CAMBRIDGE

Numerical Analysis Reports

A radial basis function method for global

Department of Applied Mathematics and Theoretical Physics

A radial basis function method for global optimization

We introduce a method that aims to nd the global minimum of a continuous

Department of Applied Mathematics and Theoretical Physics,

A radial basis function method for global optimization

where D  IRd is compact, and f : D ! IR is a continuous function de ned

yields the most useful location of the new point.

2 Interpolation by radial basis functions and a

Let n pairwise di erent points x ; : : : ; xn 2 IRd and data f ; : : : ; fn 2 IR be given,

i(kx ? xi k) + p(x); x 2 IRd;

that interpolates the data (x ; f ); : : : ; (xn; fn). The coecients i; i = 1; : : : ; n,

A radial basis function method for global optimization

where is a prescribed positive constant. Let the matrix  2 IRnn be de ned

Formally, we set V? := IRn. Obviously, Vm  Vm for all m  ?1. Powell [15]

in the linear and multiquadric cases

T  > 0 8  2 IRn n f0g:

and in the Gaussian case

(?1)m0 T  > 0 8  2 Vm0 n f0g:

Then it can be shown (see [15]) that the matrix

is nonsingular if and only if x ; : : : ; xn satisfy

where  = (1; : : : ; n)T

i(kx ? yik) + p(x); x 2 IRd;

where N 2 IN , y ; : : : ; yN 2 IRd, p 2 m , and  = ( ; : : : ; N )T satis es (2.4) for

A radial basis function method for global optimization

follows. Let s and u be any functions in A;m, i.e.

i(kx ? yik) + p(x) and u(x) =

We let the semi-inner product be the expression

Clearly, it is bilinear. To show symmetry, we use

is strictly positive, if  2 Vm n f0g and m  m , i.e. s 2 A;m is not a polynomial

3 A radial basis function method

i(kx ? xik) + p(x); x 2 IRd;

interpolate (x ; f (x )); : : : ; (xn; f (xn)). Our task is to determine xn . For a

sy (x) = sn(x) + [fn ? sn(y)] `n(y; x); x 2 IRd;

Therefore `n(y; :) can be expressed as

bi (y)pi(x); x 2 IRd; (3.3)

where the coecients of `n(y; :) are de ned by the equations

A radial basis function method for global optimization

A(y) := P((yy))T 0P (y)

u(y) := ((ky ? x k); : : : ; (ky ? xnk))T

(y) := (p (y); : : : ; pm(y))T ;

Equations (2.14) and (3.2) imply

(y; :)i = (?1)m0 +1

and, using expressions (3.2) and (3.3), we nd the expression

h`n(y; :); `n(y; :)i = (?1)m

i(y)`n(y; xi) + n(y)`n(y; y)

Thus we deduce the formula

hsy ; sy i = hsn; sni + (?1)m n(y) [fn ? sn(y)] :

then gn(y) = 0 can be achieved. However, if

then xn will be away from the xi ; i = 1; : : : ; n. In particular, for fn ! ?1, we

Iteration step: x ; : : : ; xn are the points in D where the value of f is known,

is admissible only if none of the xi is a global minimizer of sn ).

A radial basis function method for global optimization

The function gn is in nitely di erentiable on Dnfx ; : : : ; xng, but is not de ned at

The maximization of hn on D is equivalent to the minimization of gn. Further,

4 Convergence of the method

4.1 Convergence results

Then the condition

De nition 5 Let  from (2.2) and m  m be given. A continuous function

The characterization of N;m(D) is rather abstract. In the linear, cubic and

Proposition 6 Let (r) = r, (r) = r log r or (r) = r . Further, let  = 1 in

where D IRd is compact, and f : D ! IR is a continuous function dened

Let n pairwise dierent points x ; : : : ; xn 2 IRd and data f ; : : : ; fn 2 IR be given,

i(kx ? xi k) + p(x); x 2 IRd;

that interpolates the data (x ; f ); : : : ; (xn; fn). The coecients i; i = 1; : : : ; n,

where is a prescribed positive constant. Let the matrix 2 IRnn be dened

Formally, we set V? := IRn. Obviously, Vm Vm for all m ?1. Powell [15]

T > 0 8 2 IRn n f0g:

(?1)m0 T > 0 8 2 Vm0 n f0g:

where = (1; : : : ; n)T

i(kx ? yik) + p(x); x 2 IRd;

where N 2 IN , y ; : : : ; yN 2 IRd, p 2 m , and = ( ; : : : ; N )T satises (2.4) for

follows. Let s and u be any functions in A;m, i.e.

i(kx ? yik) + p(x) and u(x) =

is strictly positive, if 2 Vm n f0g and m m , i.e. s 2 A;m is not a polynomial

i(kx ? xik) + p(x); x 2 IRd;

where the coecients of `n(y; :) are dened by the equations

A(y) := P((yy))T 0P (y)

u(y) := ((ky ? x k); : : : ; (ky ? xnk))T

(y) := (p (y); : : : ; pm(y))T ;

i(y)`n(y; xi) + n(y)`n(y; y)

hsy ; sy i = hsn; sni + (?1)m n(y) [fn ? sn(y)] :

The function gn is innitely dierentiable on Dnfx ; : : : ; xng, but is not dened at

Denition 5 Let from (2.2) and m m be given. A continuous function

The characterization of N;m(D) is rather abstract. In the linear, cubic and

Proposition 6 Let (r) = r, (r) = r log r or (r) = r . Further, let = 1 in

Theorem 7 Let (r) = r, (r) = r log r or (r) = r . Further, choose the

Corollary 8 Let the assumptions of Theorem 7 on and m hold. Further, let f

Lemma 9 Let f be in N;m(D). Further, let (xn )n2IN be a sequence in D with

the value of (?1)m0 n(y) being strictly positive.

(?1)m0 n(y) 8 y 2 D n fx ; : : : ; xn0 g; n n :

As f 2 N;m(D), hs~n; s~ni is bounded above by a positive constant C . Further,

jf (y) ? sn(y)j C ; y 2 D n fx ; : : : ; xng:

jsn(y)j = jf (y)j kf k1;

Proposition 10 Let , m and be dened as in Proposition 6, and let f 2

Corollary 11 Let the assumptions of Theorem 7 on and m hold. Let be as

lim (?1)m0 kyn ? xn k ~n(xn ) = 1:

In order to investigate the behaviour of kyn ? xn k? det An(xn ), we let

v(y) := (ky ? z k); : : : ; (ky ? zk k) ; y 2 D;

where m^ = dim m and p ; : : : ; pm are as in Section 2. Thus An(xn ) is the

where and P correspond to (2.3) and (2.9), respectively, if we set n = k and

j(kzj ? yn k) ? (kzj ? xn k)j const kxn ? yn k; j = 1; : : : ; k:

Thus for < 2 we have

where n(:) is dened in Section 3 and nk is expression (4.1) for n = nk .

(?1)m0 ^k (y) = h`^k (y; :); `^k (y; :)i

for the coecients ^k and nk? .

(?1)m0 kxnk ? ynk k^k (xnk ) = 1;

Lemma 14 Let (r) = r, (r) = r log r or (r) = r . Further, choose the

(?1)m0 n(y ) = h`n; `ni K; n n :

(?1)m0 nk? (xnk )[snk ? (xnk ) ? fnk ? ]

(?1)m nk ? (y ) 1 + 2=

(?1)m0 nk ? (xnk ) (?1)m0 nk? (y )

nk (?1)m0 +1nk ?1(xnk ) (?1)m0 +1nk?1 (y0)

ij (xi ; xj ) > 0;

if the i; i = 1; : : : ; n; are not all zero and satisfy

Denote the matrix with the elements (xi; xj ); i; j = 1; : : : ; n, by , and the

whose coecients solve the system

vm(y) := ((y; x ); : : : ; (y; xn); p (y); : : : ; pm(y))T :

1 = (0) ? v (y)T A? v (y):

It follows from (x; x) = (0), x 2 D, that expression (5.6) is equivalent to

ordered, i.e. f (x ) : : : f (xn). If f (x ) = f (xn), the interpolant is a constant

where (~n) = n~ and (n) = (n ? 1) ?