You are on page 1of 29

UNIVERSITY OF CAMBRIDGE

Numerical Analysis Reports

A radial basis function method for global


optimization
H.-M. Gutmann

DAMTP 1999/NA22
December, 1999

Department of Applied Mathematics and Theoretical Physics


Silver Street
Cambridge CB3 9EW
England

DAMTP 1999/NA22

A radial basis function method for global optimization


H.-M. Gutmann
Abstract:

We introduce a method that aims to nd the global minimum of a continuous


nonconvex function on a compact subset of IRd. It is assumed that function evaluations are expensive and that no additional information is available. Radial basis
function interpolation is used to de ne a utility function. The maximizer of this
function is the next point where the objective function is evaluated. We show
that, for most types of radial basis functions that are considered in this paper,
convergence can be achieved without further assumptions on the objective function. Besides, it turns out that our method is closely related to a statistical global
optimization method, the P-algorithm. A general framework for both methods is
presented. Finally, a few numerical examples show that on the set of Dixon-Szego
test functions our method yields favourable results in comparison to other global
optimization methods.

Department of Applied Mathematics and Theoretical Physics,


University of Cambridge,
Silver Street,
Cambridge CB3 9EW,
England.
December, 1999.

A radial basis function method for global optimization

1 Introduction
Global optimization has attracted a lot of attention in the last 20 years. In
many applications, the objective function is nonlinear and nonconvex. Often,
the number of local minima is large. Therefore standard nonlinear programming
methods may fail to locate the global minimum.
In the most general way, the Global Optimization Problem can be stated as
nd x 2 D such that f (x)  f (x); x 2 D;

(GOP)

where D  IRd is compact, and f : D ! IR is a continuous function de ned


on D. Under these assumptions, (GOP) is solvable, because f attains its minimum on D.
Numerous methods to solve (GOP) have been developed (see e.g. Horst and
Pardalos [4] and Torn and Z ilinskas [19]). Stochastic methods like simulated annealing and genetic algorithms which use only function values are very popular
among users, although their rate of convergence is usually rather slow. Deterministic methods like Branch-and-Bound, however, assume that one can compute
a lower bound of f on a subset of D. This can be done, for example, when the
Lipschitz constant on f is available. The further assumptions make these methods
very powerful, but often they are not satis ed or it is too expensive to provide
the necessary information.
For the method investigated in this paper, we have in mind problems when
the only information available is the possibility to evaluate the objective function,
and each evaluation is very expensive. This may mean that it takes several hours
to calculate a function value. For example, a function evaluation at a point may
be done by building an experiment, by running a long computer simulation or
by using a nite element method. Therefore, as the duration of the optimization process is dominated by the function evaluations, our goal is to require as
few function evaluations as possible to nd an adequate estimate of the global
minimum.
The method is based on a general technique proposed by Jones [8]. Let A
be a linear space of functions, and assume that, for s 2 A, (s) is a measure
of the \bumpiness" of s. Now assume that we have calculated x ; : : : ; xn and
the function values f (x ); : : : ; f (xn). A target value f  is chosen that can be
regarded as an estimate of the optimal value, but it might be very crude. For
each y 62 fx ; : : : ; xn g, let sy 2 A be de ned by the interpolation conditions
sy (xi) = f (xi); i = 1; : : : ; n;
sy (y) = f :
(1.1)
The new point xn is chosen to be the value of y that minimizes (sy ); y 62
fx ; : : : ; xng. Thus the view is taken that the \least bumpy" of the functions sy
1

+1

H.-M. Gutmann

yields the most useful location of the new point.


We will use radial basis functions as interpolants. Their interpolation properties are very suitable. Speci cally, the uniqueness of an interpolant is achieved
under very mild conditions on the location of the interpolation points, and a
measure of bumpiness is also available.
Close relations can be established between our method and one from statistical
global optimization, namely the P-algorithm (Z ilinskas [22]). Although being
derived using a completely di erent approach, it is very similar to our method.
One special case of a P-algorithm, developed by Kushner [12], is even equivalent
to a special case of our radial basis function method.
Other global optimization methods based on radial basis functions have been
developed. Alotto et. al. [1] use interpolation by multiquadrics to accelerate a
simulated annealing method. Ishikawa et. al. [6], [7] employ radial basis functions
to estimate the global minimizer and run an SQP algorithm to locate it.
The properties of radial basis functions that are necessary for the description of
our method are introduced in the following section. In particular, we address the
question of interpolation and introduce a suitable measure of \bumpiness". The
global optimization method is described in detail in Section 3. Convergence of the
method is the subject of Section 4. Subsection 4.1 contains various convergence
results, and the proof of the main theorem can be found in Subsection 4.2. The
relation between our method and the P-algorithm is addressed in Section 5. The
nal section deals with search strategies, but a complete analysis is beyond the
scope of this paper.

2 Interpolation by radial basis functions and a


measure of bumpiness

Let n pairwise di erent points x ; : : : ; xn 2 IRd and data f ; : : : ; fn 2 IR be given,


where n and d are any positive integers. We seek a function s of the form
1

s(x) =

n
X
i=1

i(kx ? xi k) + p(x); x 2 IRd;

(2.1)

that interpolates the data (x ; f ); : : : ; (xn; fn). The coecients i; i = 1; : : : ; n,


are real numbers, and p is from m , the space of polynomials of degree less than
or equal to m. The norm k:k is the Euclidean norm in IRd. The following choices
of  are considered:
9
(r) = r
(linear);
>
>
(r) = r
(cubic);
=
(r) = rp log r (thin plate spline);
r  0;
(2.2)
(r) = r 2+ (multiquadric); >
>
;
(r) = e? r
(Gaussian);
1

A radial basis function method for global optimization

where is a prescribed positive constant. Let the matrix  2 IRnn be de ned


by
()ij := (kxi ? xj k); i; j = 1; : : : ; n:
(2.3)
Further, we introduce the linear space Vm  IRn containing all  2 IRn that satisfy
n
X
i=1

iq(xi ) = 0 8 q 2 m:

(2.4)

Formally, we set V? := IRn. Obviously, Vm  Vm for all m  ?1. Powell [15]


shows that, in the cubic and thin plate spline cases
1

+1

T  > 0 8  2 V n f0g;

(2.5)

in the linear and multiquadric cases

T  < 0 8  2 V n f0g;

(2.6)

T  > 0 8  2 IRn n f0g:

(2.7)

and in the Gaussian case


We let m be 1 in the cubic and thin plate spline cases, 0 in the linear and
multiquadric cases and ?1 in the Gaussian case. Then the inequalities (2.5) {
(2.7) can be merged into
0

(?1)m0 T  > 0 8  2 Vm0 n f0g:

(2.8)

+1

After choosing , we let m be an integer that is not less than m , and  is con ned
to Vm.
Let m^ be the dimension of m , let p ; : : : ; pm be a basis of this linear space,
and let P be the matrix
0
1
p (x )    pm(x )
... C
(2.9)
P := B
@ ...
A:
p (xn )    pm (xn)
0

Then it can be shown (see [15]) that the matrix

A=

 P
PT 0

2 IR n

m (n+m^ )

( +^)

(2.10)

is nonsingular if and only if x ; : : : ; xn satisfy


1

q 2 m and q(xi) = 0; i = 1; : : : ; n; =) q  0:

(2.11)

H.-M. Gutmann

In the Gaussian case with m = ?1, P and condition (2.11) are omitted. Therefore
the coecients of the function s in (2.1) are de ned uniquely by the system
9
s(xi) = fi; i = 1; : : : ; n >
=
n
X
:
(2.12)
ipj (xi) = 0; j = 1; : : : ; m^ >
;
i=1

Let F be the vector whose entries are the data values f ; : : : ; fn. Then the system
(2.12) becomes
  P     F 
(2.13)
c = 0 ;
PT 0
1

where  = (1; : : : ; n)T

m^
n
m
^
2 IR , c 2 IR and 0m^ is the zero in IRm^ .

The components
of c are the coecients of the polynomial p with respect to the basis p ; : : : ; pm.
The motivation for the measurement of the bumpiness of a radial basis function interpolant can be developed from the theory of natural cubic splines in one
dimension. They can be written in the form (2.1), where (r) = r ;  2 V and
p 2  . It is well known (e.g. Powell R[14]) that the interpolant s that is de ned by
the system (2.12) minimizes I (g) := IR[g00(x)] dx among all functions g : IR ! IR
that satisfy the interpolation conditions g(xi) = fi; i = 1; : : : ; n, and for which
I (g) exists and is nite. Therefore I (g) is a suitable measure of bumpiness. The
second derivative s00 is piecewise linear and vanishes outside a bounded interval.
Thus one obtains by integration by parts
1

I (s) =

IR

= 12

[s00(x)] dx = 12

n
X
i=1

i

n
X
j =1

n
X
i=1

is(xi)

j jxi ? xj j + p(xi) = 12
3

n
X
i;j =1

ij ij ;

where the last equation follows from  2 V . This relation suggests that expression
(2.8) can provide a semi-inner product and a semi-norm for each  in (2.2) and
m  m . Also, the semi-norm will be the measure of bumpiness of a radial basis
function (2.1). A semi-inner product h:; :i satis es the same properties as an inner
product, except that hs; si = 0 need not imply s = 0. Similarly, for a semi-norm
k:k, ksk = 0 does not imply s = 0.
We choose any radial basis function from (2.2) and m  m , and we de ne
A;m to be the linear space of all functions of the form
1

N
X
i=1

i(kx ? yik) + p(x); x 2 IRd;

where N 2 IN , y ; : : : ; yN 2 IRd, p 2 m , and  = ( ; : : : ; N )T satis es (2.4) for


n = N . On this space, the semi-inner product and the semi-norm are de ned as
1

A radial basis function method for global optimization

follows. Let s and u be any functions in A;m, i.e.

s(x) =

N (s)
i=1

i(kx ? yik) + p(x) and u(x) =

N (u)
j =1

j (kx ? zj k) + q(x):

We let the semi-inner product be the expression

hs; ui := (?1)m0 +1

N (s)

i u(yi):

i=1

(2.14)

Clearly, it is bilinear. To show symmetry, we use

N (s)
i=1

iq(yi) = 0 and

to deduce

hs; ui =
=
=
=

(?1)m0 +1

N (u)
j =1

j p(zj ) = 0;

0
1
X @NXu
i
j (kyi ? zj k) + q(yi)A

N (s)

( )

i=1
j =1
N (s) N (u)
m
0 +1
ij (kyi ? zj k)
(?1)
i=1 j =1
N (u)
N (s)
m
0 +1
(?1)
j
i(kzj ? yik) + p(zj )
j =1
i=1
N (u)
m
0 +1
(?1)
j s(zj ) = hu; si:
j =1

XX

0
X @X

1
A

By (2.8),

hs; si = (?1)m0 +1

N (s)
i=1

is(yi) = (?1)m0 +1

N (s)
i;j =1

ij (kyi ? yj k)

(2.15)

is strictly positive, if  2 Vm n f0g and m  m , i.e. s 2 A;m is not a polynomial


in m . Thus (2.14) is a semi-inner product on A;m that induces the semi-norm
hs; si with null space m (for details see Powell [16] and Schaback [17]).
In analogy to the variational principle for cubic splines in one dimension,
mentioned above, there is a theorem that states that the given interpolant is
the solution to a minimization problem.
0

H.-M. Gutmann

Theorem 1 (Schaback [17]) Let  be any radial basis function from (2.2), and
let m be chosen such that m  m . Given are points x ; : : : ; xn in IRd having the
0

property (2.11) and values f1 ; : : : ; fn in IR. Let s be the radial function of the form
(2.1) that solves the system (2.12). Then s minimizes the semi-norm hg; gi1=2 on
the set of functions g 2 A;m that satisfy

g(xi) = fi; i = 1; : : : ; n:

(2.16)

3 A radial basis function method


It will be shown how radial basis functions can be used in the general method of
Jones [8] to solve the problem (GOP) (cf. Powell [16]). As in Section 2, we pick
 from (2.2) and m  m . Let p ; : : : ; pm be a basis of m , where m^ = dim m.
Assume we have chosen x ; : : : ; xn 2 D that satisfy (2.11), and we know the
function values f (x ); : : : ; f (xn). Let the function
0

sn(x) =

n
X
i=1

i(kx ? xik) + p(x); x 2 IRd;

interpolate (x ; f (x )); : : : ; (xn; f (xn)). Our task is to determine xn . For a


target value fn and a point y 2 D n fx ; : : : ; xn g the radial basis function sy that
satis es (1.1) can be written as
1

+1

sy (x) = sn(x) + [fn ? sn(y)] `n(y; x); x 2 IRd;

(3.1)

where `n(y; x) is the radial basis function solution to the interpolation conditions

`n(y; xi) = 0;
`n(y; y) = 1:

i = 1; : : : ; n;

(3.2)

Therefore `n(y; :) can be expressed as

`n(y; x) =

n
X
i=1

i(y)(kx?xik)+n(y)(kx?yk)+

m
X
^

i=1

bi (y)pi(x); x 2 IRd; (3.3)

where the coecients of `n(y; :) are de ned by the equations

0 (y) 1 0 0 1
n
A(y) @ n(y) A = @ 1 A :
b(y)

0m
^

(3.4)

A radial basis function method for global optimization

Here (y) = ( (y); : : : ; n(y))T 2 IRn, b(y) = (b (y); : : : ; bm (y))T 2 IRm , n(y) 2
IR, 0n and 0m denote the zero in IRn and IRm, respectively, and, as in equation
(2.10), A(y) has the form
1

A(y) := P((yy))T 0P (y)


mm
Speci cally, letting u(y) and (y) be the vectors
^

(3.5)

u(y) := ((ky ? x k); : : : ; (ky ? xnk))T


1

and

(y) := (p (y); : : : ; pm(y))T ;


respectively, (y) and P (y) are the matrices
(y) :=

 u(y)
u(y)T (0)

 P 
P (y) := (y)T :

and

(3.6)

The square of the semi-norm hsy ; sy i of the new interpolant (3.1), as de ned
in the previous section, has the value

hsy ; sy i = hsn; sni + 2[fn ? sn(y)]hsn; `n(y; :)i + [fn ? sn(y)] h`n(y; :); `n(y; :)i:
2

Equations (2.14) and (3.2) imply

hsn; `n

(y; :)i = (?1)m0 +1

n
X
i=1

i`n(y; xi) = 0;

and, using expressions (3.2) and (3.3), we nd the expression

h`n(y; :); `n(y; :)i = (?1)m

0 +1

"X
n
i=1

i(y)`n(y; xi) + n(y)`n(y; y)

= (?1)m0 n(y):
+1

#
(3.7)

Thus we deduce the formula

hsy ; sy i = hsn; sni + (?1)m n(y) [fn ? sn(y)] :


Further, we de ne the function gn : D n fx ; : : : ; xng ! IR as the di erence
gn(y) := hsy ; sy i ? hsn; sni = (?1)m n(y) [fn ? sn(y)] ;
which is nonnegative. Since hsn; sni is independent of y, the required minimization
of hsy ; sy i and the minimization of gn(y) are equivalent.
0 +1

0 +1

H.-M. Gutmann

10
The choice of fn determines the location of xn . If
fn  min
s (y);
y2D n
+1

then gn(y) = 0 can be achieved. However, if


fn < min
s (y);
y2D n

then xn will be away from the xi ; i = 1; : : : ; n. In particular, for fn ! ?1, we


make the following deduction.
Remark 2 For fn < miny2D sn(y) let x(fn) be the minimizer of gn, i.e.
(?1)m0 n(x(fn))[sn(x(fn)) ? fn]  (?1)m0 n(y)[sn(y) ? fn]
8 y 2 D n fx ; : : : ; xng:
This is equivalent to
 s (y) ? s (x(f )) 
n
n
m

m
(?1) 0 n(x(fn ))  (?1) 0 n(y) 1 + n

sn(x(fn )) ? fn :
As fn ! ?1, the boundedness of sn on D implies
(?1)m0 n(x(?1))  (?1)m0 n(y) 8 y 2 D n fx ; : : : ; xn g:
+1

+1

+1

+1

+1

+1

+1

Therefore, the choice fn = ?1 requires the minimization of (?1)m0 n(y). This
process puts xn in a large gap between xi; i = 1; : : : ; n, a property that is of
fundamental importance to global optimization.
The following basic algorithm employs the given method.
+1

+1

Algorithm 3
Initial step: Pick  from (2.2) and m  m .
Choose points x ; : : : ; xn 2 D that satisfy (2.11). Compute the radial function sn that minimizes hs; si on A;m , subject to the interpolation conditions
0

s(xi ) = f (xi); i = 1; : : : ; n:

Iteration step: x ; : : : ; xn are the points in D where the value of f is known,


and sn minimizes hs; si, subject to s(xi ) = f (xi ); i = 1; : : : ; n.
Choose a target value fn 2 [?1; miny2D sn(y)]. (The choice fn = min sn(y)
1

is admissible only if none of the xi is a global minimizer of sn ).


Calculate xn+1 , which is the value of y that minimizes the function
gn(y) = (?1)m0 +1n(y) [sn(y) ? fn ]2 ; y 2 D n fx1; : : : ; xng:
Evaluate f at xn+1 and set n := n + 1.
Stop, if n is greater than a prescribed nmax .

(3.8)

A radial basis function method for global optimization

11

The function gn is in nitely di erentiable on Dnfx ; : : : ; xng, but is not de ned at


the interpolation points. If fn = min sn(y); y 2 D; and if sn(xi ) > fn; i = 1; : : : ; n,
then the global minimizers of gn are the global minimizers of sn. Thus one can
minimize sn, which is de ned on the whole of D, to obtain xn . If fn < min sn(y),
however, then gn(x) tends to in nity as x tends to xi; i = 1; : : : ; n. Let hn : D !
IR be de ned as
8 1
>
< g (y) ; y 62 fx ; : : : ; xng
hn(y) := > n
:
(3.9)
: 0; y = xi; i = 1; : : : ; n
1

+1

The maximization of hn on D is equivalent to the minimization of gn. Further,


hn is in nitely di erentiable on D n fx ; : : : ; xng. It can also be shown, using the
system (3.4), that it is in C (D) in the linear case, in C (D) in the thin plate spline
case, in C (D) in the cubic case, and in C 1(D) in the multiquadric and Gaussian
cases.
Under certain conditions on f and the values fn, n ! 1, it can be proved that
a subsequence of the generated points (xn)n2IN converges to a global minimum.
This is the subject of the following section.
1

4 Convergence of the method


Our aim is to prove convergence of the method for any continuous function f . A
theorem by Torn and Zilinskas [19] tells us that the sequence that is generated by
Algorithm 3 should be dense. Applied to our method, it states

Theorem 4 The algorithm converges for every continuous function f if and only
if it generates a sequence of points that is dense in D.
So our task is to establish the density of the sequence of generated points. Our
main convergence result (Theorem 7), however, does not allow all choices of  and
m.
The rst part of this section states Theorem 7 and derives a few corollaries.
The second part gives the proof of the theorem.

4.1 Convergence results

Assume that fn is set to miny2D sn(y) on each iteration and that this choice is
admissible. Then, if there are large function values in a neighbourhood U of a
global minimizer of f , the global minimizer of sn might be outside U for every
n  n , where n is the number of initial points. In this case, U is not explored
and the global optimum of f might be missed. Therefore, we have to assume that
0

H.-M. Gutmann

12

enough of the numbers min sn(y) ? fn are suciently large. Speci cally, let  > 0
and   0 be constants, where additionally  < 1 in the linear case and  < 2 in
the thin plate spline and cubic cases, and de ne
n := min
kxn ? xi k:
in?

(4.1)

min
s (y) ? fn >  =
n ksnk1 ;
y2D n

(4.2)

Then the condition

for in nitely many n 2 IN , where k:k1 denotes the maximum norm of a function
on D, will lead to the required result. We note that the norms ksnk1 may diverge
as n ! 1.
Unfortunately, the choice of  and m is restricted. In the proof of Theorem 7 we
need the result that, for any y 2 D and any neighbourhood U of y, (?1)m0 n(y)
can be bounded above by a number that does not depend on n, if none of the
points x ; : : : ; xn is in U . This condition is achieved, if there is a function that
takes the value 1 at y, that is identically zero outside U , and that is in the function
space N;m(IRd) as de ned below.
+1

De nition 5 Let  from (2.2) and m  m be given. A continuous function


F : D ! IR, D  IRd , belongs to the function space N;m(D), if there exists a
positive constant C such that, for any choice of interpolation points x ; : : : ; xn 2 D
for which (2.11) holds, the interpolant sn 2 A;m to F at these points has the
0

property

hsn; sni  C:

The characterization of N;m(D) is rather abstract. In the linear, cubic and


thin plate spline cases, the following proposition that is taken from Gutmann [3]
provides a useful criterion to check whether it is satis ed. In the multiquadric
and Gaussian cases, however, no such criterion is known.

Proposition 6 Let (r) = r, (r) = r log r or (r) = r . Further, let  = 1 in


2

the linear case,  = 2 in the thin plate spline case and  = 3 in the cubic case, and
choose the integer m such that 0  m  d in the linear case, 1  m  d +1 in the
thin plate spline case and 1  m  d + 2 in the cubic case. De ne  := (d + )=2
if d +  is even, and  := (d +  +1)=2 otherwise. If F 2 C  (IRd ) and has bounded
support, then F 2 N;m(IRd ).

Global convergence will be established only for the cases covered by this proposition. It remains an open problem whether a similar property is achieved in other
cases. Thus we have the following theorem.

A radial basis function method for global optimization

13

Theorem 7 Let (r) = r, (r) = r log r or (r) = r . Further, choose the


integer m such that 0  m  d in the linear case, 1  m  d + 1 in the thin
plate spline case and 1  m  d + 2 in the cubic case. Let (xn )n2IN be the
2

sequence generated by Algorithm 3, and sn be the radial function that interpolates


(xi; f (xi)); i = 1; : : : ; n. Assume that, for in nitely many n 2 IN , the choice of fn
satis es (4.2). Then the sequence (xn ) is dense in D.

A particular convergence result follows immediately from Theorems 4 and 7, because the right hand side of (4.2) is some real number.

Corollary 8 Let the assumptions of Theorem 7 on  and m hold. Further, let f


be continuous, and, for in nitely many n 2 IN , let fn = ?1. Then the method
converges.

An interesting question is to nd conditions on f such that the maximum norm


of an interpolant is uniformly bounded. If they hold, then the right-hand side of
(4.2) can be replaced by  n= , so this constraint on fn can be checked easily. We
consider the special case of linear splines in one dimension, when d = 1, (r) = r
and m = 0. For arbitrary points x ; : : : ; xn, the piecewise linear interpolant sn
attains its maximum and minimum values at interpolation points. Thus ksnk1 is
bounded by kf k1, a number that does not depend on the interpolation points.
Therefore in this case the term ksnk1 may be dropped from (4.2).
For other radial basis functions and other dimensions this simpli cation may
fail. It is shown in the next lemma, however, that the uniform boundedness of
the semi-norm of an interpolant is sucient for the uniform boundedness of the
maximum norm. Thus, the second convergence result applies to functions f in
N;m(D).
2

Lemma 9 Let f be in N;m(D). Further, let (xn )n2IN be a sequence in D with


pairwise di erent elements, such that (2.11) holds for n = n . For n  n , denote
the radial basis function interpolant to f at x ; : : : ; xn by sn. Then ksnk1 is
0

uniformly bounded by a number that only depends on x1 ; : : : ; xn0 .

Proof:
We x n, and we let y be any point of D n fx1; : : : ; xng. Let s~n be the radial
function that interpolates (y; f (y)) and (xi ; f (xi)); i = 1; : : : ; n. By analogy with
equation (3.1), it can be written as

s~n(x) = sn(x) + [f (y) ? sn(y)]`n(y; x); x 2 IRd;


where `n(y; :) is still the cardinal function that interpolates (xi ; 0); i = 1; : : : ; n,
and (y; 1). Thus, as shown in Section 3,
hs~n; s~ni = hsn; sni + [f (y) ? sn(y)] (?1)m0 n(y);
2

+1

H.-M. Gutmann

14
which gives the equation
[f (y) ? sn(y)] =
2

hs~n; s~ni ? hsn; sni ;


(?1)m n(y)

(4.3)

0 +1

the value of (?1)m0 n(y) being strictly positive.


Next we show that (?1)m0 n(y) is bounded away from zero. Let `n0 (y; :)
be the cardinal function that interpolates (x ; 0); : : : ; (xn0 ; 0) and (y; 1). Then the
semi-norm properties of h:; :i and Theorem 1 imply
+1

+1

0 < (?1)m0 n0 (y) = h`n0 (y; :); `n0 (y; :)i
 h`n(y; :); `n(y; :)i = (?1)m0 n(y):
+1

+1

For n = n , let A and A(y) be the matrices (2.10) and (3.5), respectively. By
using Cramer's Rule to solve (3.4), we nd
det A :
n0 (y) = det
A(y)
0

Now det A is a nonzero constant, and det A(y) is bounded on D. It follows that
(?1)m0 n0 (y) is bounded away from zero. Therefore there exists a constant
> 0 such that
+1

(?1)m0 n(y)  8 y 2 D n fx ; : : : ; xn0 g; n  n :


+1

(4.4)

As f 2 N;m(D), hs~n; s~ni is bounded above by a positive constant C . Further,


hsn; sni is nonnegative. It follows from (4.3) and (4.4) that

jf (y) ? sn(y)j  C ; y 2 D n fx ; : : : ; xng:


1

Further, because f is bounded on D, we obtain

jsn(y)j  C + kf k1:

Note that the right-hand side is independent of n and y, as required. Alternatively,


if y 2 fx ; : : : ; xng, we have
1

jsn(y)j = jf (y)j  kf k1;


which completes the proof.

Next, by applying Proposition 6, we obtain a criterion that ensures that f is


in N;m(D).

A radial basis function method for global optimization

15

Proposition 10 Let , m and  be de ned as in Proposition 6, and let f 2


C  (D), where D  IRd is compact. Then f 2 N;m(D).
Proof:
By Whitney's theorem ([20]), f can be extended to a function F 2 C  (IRd)
that is equal to f on D. Now D is contained in a closed ball of radius , say,
and there is an in nitely di erentiable function g with g(x) = 1; kxk  , and
g(x) = 0; kxk  2. Thus F  g is in C  (IRd), and by Proposition 6 it is in
N;m(IRd ). Since F  g is equal to f on D, it follows from the de nition of the
semi-norm that f 2 N;m(D).
2

We complete this subsection by combining Theorem 7, Lemma 9 and Proposition 10.

Corollary 11 Let the assumptions of Theorem 7 on  and m hold. Let  be as


in Proposition 6, and let f 2 C  (D). Further, assume that, for in nitely many
n 2 IN , fn has the property
min
s (y) ? fn   n= ;
y2D n
2

where  > 0 is a constant, and where n and  are as in the rst paragraph of
this subsection. Then the method converges.

4.2 Proof of Theorem 7


In order to establish Theorem 7, some lemmas are needed on the behaviour of the
coecients n.

Lemma 12 Let  be any of the radial basis functions in (2.2), and let m and
m  m be chosen as in Section 2. Let D  IRd be compact, and let (xn)n2IN be
a convergent sequence in D with pairwise di erent elements. Further, let (yn)n2IN
be a sequence in D such that yn =
6 xn; n 2 IN , and limn!1 kxn ? ynk = 0: Choose
k points z ; : : : ; zk 2 D that satisfy condition (2.11). Assume (xn) converges to
x 2 Dnfz ; : : : ; zk g. For any y 2 Dnfz ; : : : ; zk ; yn g, let `~y be the cardinal spline
0

+1

that interpolates the data (z1 ; 0); : : : ; (zk ; 0); (yn+1; 0) and (y; 1), and let ~n(y) be
the coecient of `~y that satis es (?1)m0 +1~n (y) = h`~y ; `~y i. Then, for 0   < 1
in the linear case and 0   < 2 in the other cases,

lim (?1)m0 kyn ? xn k ~n(xn ) = 1:

n!1

+1

+1

+1

+1

(4.5)

H.-M. Gutmann

16

Proof:
Let An and An(xn+1 ) be the matrices of the form (2.10) for the interpolation
points z1 ; : : : ; zk ; yn+1 and z1 ; : : : ; zk ; yn+1; xn+1, respectively. For suciently large
n, neither xn+1 nor yn+1 is in the set fz1 ; : : : ; zk g. Thus, An and An(xn+1) are
nonsingular. Cramer's Rule implies

An :
~n(xn ) = detdet
An(xn )
+1

(4.6)

+1

Also, let A be the matrix of the form (2.10) for the interpolation points z ; : : : ; zk ; x .
By the continuity of the determinant,
1

lim det An = det A 6= 0:

(4.7)

n!1

In order to investigate the behaviour of kyn ? xn k? det An(xn ), we let


+1

+1

+1

T

v(y) := (ky ? z k); : : : ; (ky ? zk k) ; y 2 D;


1

and

T

p(y) := p (y); : : : ; pm(y) ; y 2 D;


1

where m^ = dim m and p ; : : : ; pm are as in Section 2. Thus An(xn ) is the


matrix
1

0 
BB v(yn
BB
@ v(xn

+1

v(yn )
v(xn )
P
)T
(0)
(kyn ? xn k) p(yn )T
)T (kyn ? xn k)
(0)
p(xn )T
PT
p(yn )
p(xn )
0
+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1
CC
CC ;
A

(4.8)

+1

where  and P correspond to (2.3) and (2.9), respectively, if we set n = k and


fx ; : : : ; xng = fz ; : : : ; zk g. Now the rows
1

( v(yn )T
(0)
(kyn ? xn k) p(yn )T )
and ( v(xn )T (kyn ? xn k)
(0)
p(xn )T )
+1

+1

+1

+1

+1

+1

+1

+1

have the same limit as n ! 1, so det An(xn ) tends to zero. Hence the properties
(4.6) and (4.7) prove the assertion (4.5) for  = 0.
For  > 0, note that the value of the determinant of the matrix (4.8) does
not change if we replace the second row by the di erence between the second and
third rows, and subsequently replace the second column by the di erence between
+1

A radial basis function method for global optimization

17

the second and third columns. Then det An(xn ) becomes


+1




v
(
y
)
?
v
(
x
)
v
(
x
)
P
n
n
n


T
T
v(yn )

(kyn ? xn k) p(yn )
?v(xn )T 2[(0) ? (kyn ? xn k)]
?p(xn )T :
?(0)
v(xn )T (kyn ? xn k) ? (0)
(0)
p(xn )T


PT
p(yn ) ? p(xn )
p(xn )
0
+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

(4.9)
We have to divide the determinant by kyn ? xn k, so we divide the second
row and then the second column of (4.9) by kyn ? xn k= . Then the following
remarks are helpful.
For each choice of  and each j = 1; : : : ; k, the function (kzj ? xk); x 2 D, is
Lipschitz continuous, so the components of v(yn ) ? v(xn ) satisfy
+1

+1

+1

+1

+1

+1

j(kzj ? yn k) ? (kzj ? xn k)j  const kxn ? yn k; j = 1; : : : ; k:


+1

+1

+1

+1

Thus for  < 2 we have


h
i
1
(kzj ? yn k) ? (kzj ? xn k) = 0;
lim
n!1 kyn ? xn k=
+1

+1

+1

+1

(4.10)

Similarly, for  < 2, the components of p(yn ) ? p(xn ) have the property
h
i
1
lim
p (y ) ? pi(xn ) = 0
(4.11)
n!1 kyn ? xn k= i n
Finally, we deduce
h
i
1
(kyn ? xn k) ? (0) = 0;
(4.12)
lim
n!1 kyn ? xn k
for  < 1 in the linear case, for  < 2 in the thin plate spline, multiquadric and
Gaussian cases, and for  < 3 in the cubic case. This is clear in the linear, thin
plate spline and cubic cases. In the other two cases it follows from second order
Taylor expansion of , because 0(0) = 0 and 00 is bounded on IR . Thus (4.9) {
(4.12) provide
lim ky ? xn k? det An(xn ) = 0;
n!1 n
+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

for the given values of . Hence (4.6) and (4.7) imply (4.5).

Now we obtain

Lemma 13 Let , m and m be chosen as in Lemma 12, and let (xn)n2IN be the
sequence generated by Algorithm 3. Further, let 0   < 1 in the linear case and
0

H.-M. Gutmann

18

0   < 2 in the other cases. Then, for every convergent subsequence (xnk )k2IN of
(xn),
m0  
lim
(
?
1)
nk nk ? (xnk ) = 1;
k!1
+1

where n(:) is de ned in Section 3 and nk is expression (4.1) for n = nk .

Proof:
For n  2, de ne jn to be the natural number j that minimizes kxn ? xj k; j < n,
so n = kxn ? xjn k. Further, let (yn)n2IN be the sequence

 x ; n = 1;
yn :=
2

xjn ; n  2:

Let (xnk )k2IN be a subsequence of (xn)n2IN , that converges to x , say. Convergence


and the choice of (yn)n2IN imply limk!1 kxnk ? ynk k = 0.
The initial step of Algorithm 3 provides a nite number of points that satisfy
(2.11), so the initial interpolation matrix (2.10) is nonsingular. If one of these
points is x , we pick xnk0 in a neighbourhood of x so that the interpolation
matrix to xnk0 and the other initial points is also nonsingular. Therefore there exist
points x^ ; : : : ; x^l in (xn)n2IN such that their interpolation matrix is nonsingular,
and x 62 fx^ ; : : : ; x^l g.
For suciently large k 2 IN , such that ynk 62 fx^ ; : : : ; x^l g, and for any
y 2 D n fx ; : : : ; xnk ? g, we let `^k (y; :) be the radial function that interpolates
(^x ; 0); : : : ; (^xl ; 0); (ynk ; 0) and (y; 1), and we let `nk ? (y; :) be the interpolant
to (x ; 0); : : : ; (xnk ? ; 0) and (y; 1). Because `nk ? (y; :) interpolates (ynk ; 0) and
(^xi; 0); i = 1; : : : ; l, for suciently large k, (3.7) and Theorem 1 imply the inequality
1

(?1)m0 ^k (y) = h`^k (y; :); `^k (y; :)i


 h`nk? (y; :); `nk? (y; :)i = (?1)m0 nk? (y)
+1

+1

(4.13)

for the coecients ^k and nk? .


We apply Lemma 12 in the case when fz ; : : : ; zk g is the set fx^ ; : : : ; x^l g and
n = nk ? 1. It follows that
1

(?1)m0 kxnk ? ynk k^k (xnk ) = 1;


lim (?1)m0 nk ^k (xnk ) = klim
!1

k!1

+1

+1

with the choice of  stated in Lemma 12. Thus, setting y = xnk in (4.13), we
obtain that (?1)m0 nk nk ? (xnk ) tends to in nity as k ! 1.
2
+1

Finally we show, using Proposition 6, that the coecients n(y) are uniformly
bounded, if y is bounded away from the points that are generated by the algorithm.

A radial basis function method for global optimization

19

Lemma 14 Let (r) = r, (r) = r log r or (r) = r . Further, choose the


integer m such that 0  m  d in the linear case, 1  m  d + 1 in the thin plate
spline case and 1  m  d + 2 in the cubic case. Let (xn)n2IN be the sequence
2

generated by Algorithm 3, and let n0 be the number of points chosen in the initial
step. Assume that there exist y0 2 D and a neighbourhood N := fx 2 IRd :
kx ? y0k < g,  > 0, that does not contain any point of the sequence. Then there
exists K > 0, that depends only on y0 and , such that

(?1)m0 n(y )  K 8 n  n :
+1

Proof:
For any n  n0 , let `n be the radial function that is de ned by `n(xi) = 0; i =
1; : : : ; n, and `n(y0) = 1. There exists a compactly supported in nitely di erentiable function F that takes the value 1 at y0 and 0 on IRd n N . It follows from
Proposition 6 that F 2 N;m. `n interpolates F at x1 ; : : : ; xn and y0. Therefore,
there is a positive number K , depending on y0 and , such that

(?1)m0 n(y ) = h`n; `ni  K; n  n :


+1

Now we are ready to prove Theorem 7.


Proof of Theorem 7:
Assume there is y0 2 D and an open neighbourhood U = fx 2 IRd : kx ? y0k < g,
 > 0, that does not contain an interpolation point. The iteration step of Algorithm 3 gives
gn(xn+1 )  gn(y0); n  n0 ;
where n0 is the number of points chosen in the initial step of the algorithm.
By assumption (4.2), there is a subsequence (nk )k2IN of the natural numbers such
that
2
min
s (y) ? fnk ?1 >  =
(4.14)
nk ?1 ksnk ?1 k1  0; k 2 IN;
y2D nk ?1

with  > 0, nk ? being the expression (4.1) for n = nk ? 1, 0   < 1 in the linear
and 0   < 2 in the thin plate spline and cubic cases. The sequence (xnk )k2IN is a
sequence in a compact set, thus it contains a convergent subsequence. Therefore,
without loss of generality, we assume that (xnk )k2IN itself converges.
For all k 2 IN , xnk is the minimizer of gnk ? (x). Thus, if fnk ? > ?1,
1

(?1)m0 nk? (xnk )[snk ? (xnk ) ? fnk ? ]


 (?1)m0 nk ? (y )[snk ? (y ) ? fnk ? ] :
+1

+1

(4.15)

H.-M. Gutmann

20

If ksnk ? k1 > 0, this inequality, condition (4.14) and the de nition of k:k1 provide
1

(?1)m0 +1 

m +1
nk ?1 (xnk )  (?1) 0 nk ?1 (y0 )

nk ?1
snk ?1 (xnk ) ? fnk ?1
k 1


j
s
nk ? (y ) ? snk ? (xnk )j
nk ? (y ) 1 +
snk ? (xnk ) ? fnk ?
#
"
1 jsnk ? (y ) ? snk ? (xnk )j
nk ? (y ) 1 + =
ksnk ? k1
 nk

 (?1)m0 +1
 (?1)m0 +1

 sn ? (y ) ? f  

"

 (?1)m nk ? (y ) 1 + 2=


0 +1

 nk

If ksnk ? k1 = 0, snk ? (y) ? fnk ? is a positive number independent of y, thus


(4.15) gives
1

(?1)m0 nk ? (xnk )  (?1)m0 nk? (y )


+1

+1

(?1)m0 +1

"

nk? (y ) 1 + 2=
  nk
1

for any positive  , as before. Remark 2 shows that this inequality holds also in
the case fnk ? = ?1. Multiplying both sides by nk yields
1

nk (?1)m0 +1nk ?1(xnk )  (?1)m0 +1nk?1 (y0)

2
=
nk +

2 :

2

(4.16)

By Lemma 13, the left-hand side of (4.16) tends to in nity as k tends to in nity.
However, Lemma 14 states that (?1)m0 n(y ) is bounded above by a constant
that does not depend on n. Thus the right-hand side of (4.16) is bounded by a
constant that is independent of k which contradicts (4.16). Therefore there is a
point in the sequence that is an element of U . This implies that in each neighbourhood of an arbitrary y 2 D there are in nitely many elements of (xn)n2IN ,
so the sequence is dense in D.
2
+1

5 Relations to statistical global optimization


In this section we consider the similarities between the given radial basis function
method and the P-algorithm. The idea of that method is proposed by Kushner
[11], [12] for one-dimensional problems. Here the objective function is regarded as
a realization of a Brownian motion stochastic process. If real numbers x <: : :<xn
1

A radial basis function method for global optimization

21

are given, and their function values f (x ); : : : ; f (xn) have been calculated, the
model yields, for each x in the feasible set, a mean value Mean(x) and a variance
Var(x). Mean(x) serves as a prediction of the true function value at x, while
Var(x) is a measure of uncertainty. It turns out that Mean is the piecewise linear
interpolant of the given data. The variance is piecewise quadratic on [x ; xn],
nonnegative and takes the value zero at x ; : : : ; xn. For a real number x, let Fx
be the normally distributed random variable Fx with mean Mean(x) and variance
Var(x). Now a nonnegative n is chosen, and the next point xn will be the one
that maximizes the utility function
1

+1

P Fx  i min
f (xi) ? n ; x 2 D;
;:::;n

(5.1)

=1

where P denotes probability. One can show that maximizing (5.1) is equivalent
to maximizing
Var(x)
; x 2 D:
(5.2)
[Mean(x) ? minff (x ); : : : ; f (xn)g + n]
2

Compare our method in one dimension with the choice of linear splines, i.e. (r) =
r and m = 0, and with the target values fn = minff (x ); : : : ; f (xn)g ? n. In
this case, the interpolant sn is identical to Mean. Further, except for a constant
factor,
1
Var(x) = ?
n(x) :
Therefore, Kushner's method and our method using linear splines are equivalent.
Z ilinskas [21], [22] extends this approach to Gaussian random processes in
several dimensions. He uses the selection rule (5.1) and introduces the name \Palgorithm". In addition, he gives an axiomatic description of the terms involved
in (5.1), namely the mean value function, the variance function and the utility
function. We relate these results to our method.
Consider a symmetric function  : IRd  IRd ! IR; (x; z) 7! (x; z), and
assume that  is conditionally positive or negative de nite of order m. This
means, there exists 2 f0; 1g such that, given n di erent points x ; : : : ; xn 2 IRd
and multipliers  ; : : : ; n 2 IR, we have
1

(?1)

n
X

i;j =1

ij (xi ; xj ) > 0;

if the i; i = 1; : : : ; n; are not all zero and satisfy


n
X
i=1

ip(xi ) = 0; p 2 m:

H.-M. Gutmann

22

Denote the matrix with the elements (xi; xj ); i; j = 1; : : : ; n, by , and the


matrix with the elements pj (xi ); i = 1; : : : ; n; j = 1; : : : ; m^ by P , where fpj : j =
1; : : : ; m^ g is a basis of m and m^ its dimension. The analogue of expression (2.10)
is the matrix
  P
A = PT 0 :
(5.3)
Further, we now let the interpolant s to the components of F = (f (x ); : : : ; f (xn))T
be the function
1

s(y) =

n
X
i=1

i(y; xi) +

m
X
^

j =1

cj pj (y);

whose coecients solve the system

  F 
A c = 0 ;
m
^

It can be written as

s(y) = vm

(y)T A?1

F 
0m ; y 2 D ;
^

(5.4)

where vm (y) is the vector

vm(y) := ((y; x ); : : : ; (y; xn); p (y); : : : ; pm(y))T :


1

(5.5)

The nonnegative function


Var(y) = j(y; y) ? vm (y)T A? vm (y)j; y 2 D;
1

(5.6)

is assumed to be a measure of uncertainty. Note that vm(xi ) is the i-th column of


A, i = 1; : : : ; n, so we obtain
Var(xi ) = j(xi; xi) ? vm (xi)T A? vm(xi )j
= j(xi; xi) ? vm (xi)T eij
= 0:
1

Thus there is no uncertainty at the interpolation points, which is meaningful


because we know the true function values there.
For the P-algorithm,  is interpreted as the correlation function of a Gaussian
stochastic
process. The use of a normal distribution, for example, gives (x; y) :=
2=
?k
x
?
y
k
e
, but other choices of  are also considered in the literature. All of them
are positive de nite, so we set m = ?1. The conditional mean and the conditional
2

A radial basis function method for global optimization

23

variance can be expressed as ([21])


Mean(y) =
Var(y) =

1
0
(y; x )
f (x ); : : : ; f (xn) ? B
@ ... CA ;
(y; xn)
0
1

(
y;
x
)


(y; y) ? (y; x ); : : : ; (y; xn) ? B
@ ... CA ;


(y; xn)

(5.7)
(5.8)

which agree with (5.4) and (5.6).


For our method, given  and m, it is suitable to de ne (x; y) := (kx ? yk).
Thus  = , and the coecients of the interpolant s solve (2.13). This gives the
form
F 
T
?
(5.9)
s(y) = vm (y) A
0 ;
1

which is the same as the function (5.4).


We have seen already that j1=nj can be regarded as a variance in the case of
linear splines in one dimension. An expression for it containing the matrix A and
the vector vm(x) can be derived in the following way. For any y 2 Dnfx ; : : : ; xng,
consider the cardinal function (3.3). The second cardinality condition from (3.2)
implies
n
m
i (y) (ky ? x k) + X
bj (y) p (y):
1 = (0) + X
(5.10)
i
j
n(y)
i n(y )
j n(y )
1

=1

=1

The coecients (y); n(y) and b(y) solve the system (3.4). Moreover, the vector
(5.5) contains the rst n and the last m^ elements of the (n +1)-th column of A(y).
Therefore (y) and b(y) also solve

A
which implies

 (y) 
b(y)

= ?n(y)vm(y);

1
(y) = ?A? v (y):
m
n(y) b(y)
Thus, replacing the terms i(y)=n(y) and bj (y)=n(y) in (5.10), we nd
1

1 = (0) ? v (y)T A? v (y):


m
m
n(y)
1

(5.11)

It follows from (x; x) = (0), x 2 D, that expression (5.6) is equivalent to


j1=n(y)j.

H.-M. Gutmann

24

Finally, we consider the selection rule for the next point. For the P-algorithm,
it has already been noted that the maximization of (5.1) is equivalent to the
maximization of (5.2). For a given target value f  de ne the function U : IR ! IR
as
(5.12)
U (M; V ) := (M V? f ) :
It is increasing in V and decreasing in M , if f   M . Also, it satis es the axioms
of rational search stated in Z ilinskas [22]. Then, employing (5.4) and (5.6), both
methods choose the point that maximizes
2

U s(y); Var(y) ; y 2 D:

6 Search strategies and practical questions


Practical features of our method have received little attention in this paper. Several questions arise concerning the choice of parameters in Algorithm 3.
1. What radial basis function  should be chosen, and what polynomial degree
m?
2. What is a good strategy for the choice of the target values fn?
3. Given a target value, how should the minimization of gn in (3.8) (or the maximization of hn in (3.9)) be carried out? Should we approximate the global
optimum of gn (or hn ) or compute a (possibly non-global) local minimum?
The rst problem has not been investigated thoroughly. Experiments using cubic
and thin plate splines on a few test functions suggest that one cannot say in
general that one of them is better than the other. Experiments have not been
tried yet for the other types.
The choice of target values is crucial for the performance of the method. The
interpretation of the two extremal cases has been noted in Section 3. Speci cally,
the choice fn = miny2D sn(y) means that we trust our model and assume that
the minimizer of sn is close to the global minimizer of f . In the other case,
namely fn = ?1, we try to nd a point in a region that has not been explored
at all. It may be best to employ a mix between values of fn that are suitable
for convergence to a local minimizer and values that provide points in previously
unexplored regions of the domain.
Research is going on for the third question. We prefer to maximize hn , as
this function is de ned everywhere on D. It might seem strange that we consider
computing the global optimum of hn, i.e. that a global optimization problem is
replaced by another one. However, unlike f , hn can be evaluated quickly, and
derivatives are available as well. Also we know roughly where the local maxima

A radial basis function method for global optimization


function
dimension
Branin
2
2
Goldstein-Price
Hartman 3
3
4
Shekel 5
4
Shekel 7
Shekel 10
4
6
Hartman 6

25

no. of local no. of global


minima
minima
domain
3
3
[?5; 10]  [0; 15]
4
1
[?2; 2]
4
1
[0; 1]
5
1
[0; 10]
7
1
[0; 10]
10
1
[0; 10]
4
1
[0; 1]
2

4
4
4

Table 1: Dixon-Szego test functions and their dimension, the domain and the
number of local and global minima.
of hn lie. Thus the maximization of hn is much easier than the minimization of
f . In addition, as the problem (GOP) is very dicult under our assumptions, it
would take too long to compute a global minimizer accurately. Therefore, from a
practical point of view, we are interested in an approximate solution of (GOP). So
it should suce to determine an approximation to the maximizer of (3.9). As far
as the second option in question 3 is concerned, we have to nd a way to choose
starting points or search regions in order to ensure fast convergence, which is still
an open problem.
Experiments show that large di erences between function values can cause the
interpolant to oscillate very strongly. Thus its minimal value can be much below
the least calculated function value. We have found in numerical computations
that these ineciencies are reduced if large function values are replaced by the
median of all available function values.
Some experiments were performed using the test functions proposed by Dixon
and Szego [2]. Table 1 gives the name of each function, the dimension, the domain
and the number of local and global minima in that domain. The maximization of
(3.9) was carried out using a version of the tunneling method (Levy and Montalvo
[13]).
The target values fn are determined as follows. The idea is to perform cycles
of N + 1 iterations for some N 2 IN , where each cycle employs a range of target
values, starting with a low one (global search), and ending with a value of fn
that is close to min sn(y) (local search). Then we go back to a global search,
starting the cycle again. The results that we report have been obtained using
the following strategy. We choose the cycle length N = 5. Let the number of
initial points be n , let the cycle start at n = n~ , and let the function values be
0

H.-M. Gutmann

26

ordered, i.e. f (x )  : : :  f (xn). If f (x ) = f (xn), the interpolant is a constant


function, because we pick m  0, so the maximization of (3.9) is equivalent to
the maximization of j1=nj, if fn < f (x ). In this case, we choose fn = ?1.
Otherwise, for n~  n  n~ + N ? 1, we set
1

f
n

= min
s (y) ?
y2D n

 N ? n + n~  
2

f (x n ) ? min
s (y) ;
y2D n
( )

n ? n 

where (~n) = n~ and (n) = (n ? 1) ?


N , n~ + 1  n  n~ + N ? 1. When
n = n~ + N we set fn = min sn(y). However, we have to be careful here since this
choice is only admissible if x is not one of the global minimizers of sn. Thus, we
do not accept this choice, if (f (x ) ? min sn(y)) < 10? jf (x )j, provided f (x ) is
nonzero, or if f (x ) ? miny2D sn(y) < 10? , if f (x ) = 0. In these cases, we set
fn = min sn(y) ? 10? jf (x )j and fn = min sn(y) ? 10? , respectively, to try to
obtain a yet lower function value.
We use thin plate splines as radial basis function, except for the Hartman 3
function where we take cubic splines. The algorithm is stopped when the relative
error jf ? f  j=jf j becomes smaller than a xed , where f  is the global
optimum and f the current best function value. The optimal values of all test
functions in Table 1 are nonzero, so this stopping criterion is valid.
Table 2 reports the number of function evaluations needed to achieve a relative
error less than 1% and 0:01%. RBF denotes our method using the target value
strategy described above. DIRECT (Jones, Perttunen and Stuckman [9]) and
MCS (Multilevel Coordinate Search, Huyer and Neumaier [5]) are recent methods
that, according to the results presented in those papers, are more ecient than
most of their competitors on the Dixon-Szego testbed. DE (Di erential Evolution,
Storn and Price [18]) is an evolutionary method that operates only at the global
level, which explains the large number of function evaluations. EGO (Ecient
Global Optimization, Jones, Schonlau and Welch [10]) is the latest method known
to us. Unfortunately, no tests are reported on the Shekel test functions. All the
results from these methods are quoted from the papers mentioned above. For the
DIRECT method, numbers of evaluations for both the 1% and the 0:01% stopping
criterion are reported. For DE and EGO only results for the 1% criterion are
available, whereas MCS only uses the 0:01% criterion. It should be noted that
MCS uses a local search method at some stages of the algorithm, and in all the
cases of Table 2 the rst local minimum found is the global one.
0

best

best

7 Conclusions
Our global optimization method converges to the global minimizer of an arbitrary
continuous function f , if we choose the sequence of target values carefully. If f is

A radial basis function method for global optimization

27

error < 1%
Test function

error< 0:01%

RBF DIRECT DE EGO RBF DIRECT MCS

Branin
44
Goldstein-Price 63
25
Hartman 3
Shekel 5
76
Shekel 7
76
Shekel 10
51
112
Hartman 6

63
101
83
103
97
97
213

1190 28
1018 32
476 35
6400 {
6194 {
6251 {
7220 121

64
76
79
100
125
112
160

195
191
199
155
145
145
571

41
81
79
83
129
103
111

Table 2: Number of function evaluations for our method in comparison to DIRECT, DE, EGO and MCS with two di erent stopping criteria.
suciently smooth, there is even a suitable condition on this sequence that can be
checked by the algorithm. However, it is unsatisfactory that the multiquadric and
Gaussian cases are excluded from the statement of Theorem 7. It is believed that
the convergence result is true also in these cases, although they are not covered
by the analysis in [3].
Table 2 shows that the method is able to compete with other global optimization methods on the set of the Dixon-Szego test functions. The test functions in
this testbed, however, are of relatively low dimension, and the number of local
and global minima is very small. Therefore, it is necessary to test the method on
other sets of test functions and of course on real-world applications.
The relation to the P-algorithm is very interesting. It is hoped that the connections can be exploited further. In particular, the choice of the target values
and the determination of the point of highest utility are common problems. Solutions to these problems may be developed that are useful for both methods.

Acknowledgement I am very grateful to my supervisor, Prof. M.J.D. Powell,


for his constant help and his guidance. Also, I would like to thank the German
Academic Exchange Service for supporting this research with a doctoral scholarship (HSP III) and the Engineering and Physical Sciences Research Council for
further support with a research grant.

28

H.-M. Gutmann

References
[1] P. Alotto, A. Caiti, G. Molinari, and M.Repetto. A Multiquadrics-based
Algorithm for the Acceleration of Simulated Annealing Optimization Procedures. IEEE Transactions on Magnetics, 32(3):1198{1201, 1996.
[2] L.C.W. Dixon and G.P. Szego. The Global Optimization Problem: An Introduction. In L.C.W. Dixon and G.P. Szego, editors, Towards Global Optimization 2, pages 1{15. North-Holland, Amsterdam, 1978.
[3] H.-M. Gutmann. On the semi-norm of radial basis function interpolants. In
preparation.
[4] R. Horst and P.M. Pardalos. Handbook of Global Optimization. Kluwer,
Dordrecht, 1994.
[5] W. Huyer and A. Neumaier. Global optimization by multilevel coordinate
search. Journal of Global Optimization, 14(4):331{355, 1999.
[6] T. Ishikawa and M. Matsunami. An Optimization Method Based on Radial
Basis Functions. IEEE Transactions on Magnetics, 33(2):1868{1871, 1997.
[7] T. Ishikawa, Y. Tsukui, and M. Matsunami. A Combines Method for the
Global Optimization Using Radial Basis Function and Deterministic Approach. IEEE Transactions on Magnetics, 35(3):1730{1733, 1999.
[8] D.R. Jones. Global optimization with response surfaces. Presented at the
Fifth SIAM Conference on Optimization, Victoria, Canada, 1996.
[9] D.R. Jones, C.D. Perttunen, and B.E. Stuckman. Lipschitz Optimization
Without the Lipschitz Constant. Journal of Optimization Theory and Applications, 78(1):157{181, 1993.
[10] D.R. Jones, M. Schonlau, and W.J. Welch. Ecient Global Optimization of
Expensive Black-Box Functions. Journal of Global Optimization, 13(4):455{
492, 1998.
[11] H.J. Kushner. A Versatile Model of a Function of Unknown and Time Varying
Form. Journal of Mathematical Analysis and Applications, 5:150{167, 1962.
[12] H.J. Kushner. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise. Journal of Basic Engineering, 86:97{106, 1964.
[13] A.V. Levy and A. Montalvo. The Tunneling Algorithm for the Global Minimization of Functions. SIAM Journal on Scienti c and Statistical Computing,
6(1):15{29, 1985.

A radial basis function method for global optimization

29

[14] M.J.D. Powell. Approximation Theory and Methods. Cambridge University


Press, 1981.
[15] M.J.D. Powell. The Theory of Radial Basis Function Approximation in 1990.
In W.A. Light, editor, Advances in Numerical Analysis, Volume 2: Wavelets,
Subdivision Algorithms and Radial Basis Functions, pages 105{210. Oxford
University Press, 1992.
[16] M.J.D. Powell. Recent research at Cambridge on radial basis functions. In
M.W. Muller, M.D. Buhmann, D.H. Mache, and M. Felten, editors, New
Developments in Approximation Theory, International Series of Numerical
Mathematics, Vol. 132, pages 215{232. Birkhauser Verlag, Basel, 1999.
[17] R. Schaback. Comparison of radial basis function interpolants. In K. Jetter and F. Utreras, editors, Multivariate Approximations: From CAGD to
Wavelets, pages 293{305. World Scienti c, Singapore, 1993.
[18] R. Storn and K. Price. Di erential Evolution - A Simple and Ecient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization, 11(4):341{359, 1997.
[19] A. Torn and A. Z ilinskas. Global Optimization. Springer, Berlin, 1987.
[20] H. Whitney. Analytic extension of di erentiable functions de ned in closed
sets. Transactions of the American Mathematical Society, 36:63{89, 1934.
[21] A. Z ilinskas. Axiomatic Approach to Statistical Models and their Use in
Multimodal Optimization Theory. Mathematical Programming, 22(1):104{
116, 1982.
[22] A. Z ilinskas. Axiomatic Characterization of a Global Optimization Algorithm and Investigation of its Search Strategy. Operations Research Letters,
4(1):35{39, 1985.

You might also like