Professional Documents
Culture Documents
104
CH2752-4/89/OOOO/0104$01.OO 0 1989 IEEE
Elschlager (1973) use a related method for matching templates by which make the width 2b of the eye roughly four times the radius r
continuous deformations. of the iris, and (iii) forces which encourage the centers of the whites
of the eyes t o be roughly midway from the center of the eye t o the
boundary.
2 Preprocessing
T h e template is illustrated in figure 1. I t has a total of eleven
T h e deformable templates act on three representations of the image, parameters; &, z'=, p1, pz, r, a, b, c and 6'. All of these are allowed
a s well as on the image itself. These representations are chosen t o vary during the matching.
t o extract properties of the image, such a s peaks and valleys in the
image intensity and places where the image intensity changes quickly.
An additional representation could be added to describe textural
properties. An advantage of using these representations is that the
templates need only be specified in simple terms. For example we do
not need to specify the intensity values on the iris, merely that the
iris is a valley in the image intensity. Another advantage of using
these representations is that they enable long range interactions to
occur.
These representations do not have to be very precise, and they
Figure 1.See text
can be calculated fairly simply. Our present methods involve using
T o give the explicit representation for the boundary we first define
morphological filters ( M a n g o s 1987, Serra 1982) to extract these
two unit vectors
features. The fields are smoothed to ensure long range interactions,
for details see Yuille, Cohen and Hallinan (1988).
e; = (cos6',sin6') ( 1)
and
3 The Eye Template
After some experimentation and informal psychophysics on the salience & = (-8in0,COS6'), (2)
of different features of eyes we decided that the template should con- which change as the orientation of the eye A point in
sist of the following features: space can be represented by (z1,zz) where
(1) A circle of radius r, centered on a point &. This corresponds
t o the boundary between the iris and the whites of the eye and is
z'= Zle; + zz&. (3)
attracted to edges in the image intensity. The interior of the circle
Using these coordinates the top half of the boundary can be rep-
is attracted to valleys, or low values, in the image intensity.
resented by a section of a parabola with zlc [-b, b ]
(2) A bounding contour of the eye attracted to edges. This con-
tour is modelled by two parabolic sections representing the upper
2.2 =a- QZZ. (4)
and lower parts of the boundary. It has a center Ze, width 2 b , maxi- b2
mum height a of the boundary above the center, maximum height c Note that the maximal height, z2, of the parabola is 4 and the
of the boundary below the center, and an angle of orientation 6'. height is zero a t ZI = +b. Similarly the lower half of the boundary
(3) Two points, corresponding to the centers of the whites of the is given by
eyes, which are attracted to peaks in the image intensity. These
points are labelled by & + pl(cosO,sin6') and ?e + pz(cosO,sin6'), 22 = -c + ,z:; (5)
where p1 2 0 and pz 5 0. The point z', lies a t the center of the eye where z1c [-b, b ] .
and 6' corresponds to the orientation of the eye.
3.1 The Energy Function for the Eye Template
(4) The regions between the bounding contour and the iris also
correspond to the whites of the eyes. They will be attracted t o large We now define a potential energy function for the image which will
be minimized as a function of the parameters of the template. This
values in the image intensity.
energy function not only ensures that the algorithm will converge,
These components are linked together by three types of forces:
by acting as a Lyaponov function, but also gives a measure of the
(i) forces which encourage 2', and 2, to be close together, (ii) forces
goodness of fit of the template.
105
T h e complete energy function Eo(&,f, PI,m ,a, 6,e, r, 0) is given force, the orientation by the peak force, and the fine scale detail by
as a combination of terms due t o valley, edge, peak, image and in- the edge and intensity forces. In this scenario the values of the c's
ternal potentials. More precisely, will be changed dynamically. Typical values for the coefficients are
( ~ 1c ,z , c 3 , c 4 , c 5 , CO) m (4000,50,50,125, 150,50) and ( k l , kz, k3) m
Ec = E, + E e + Ei + Ep + Einternafi (6) (10,1,0.05).
T h e individual energy terms can be written as functions of the
where: (i) The valley potentials are given by the integral over the
parameter values. For example, the sum over the boundary can be
interior of the circle divided by the area of the circle,
expressed as an integral function of &, a, b, c and B by
=
c1
-Area 1 @" (W.4
(ii) T h e edge potentials are given by the integrals over the bound-
(7)
aries of the circle divided by its length and over the parabolae divided
by their lengths,
where s corresponds to the arc length of the curve and Length t o its
total length. Note that scale independence is achieved by dividing
E==-"-/" cJe(?)ds- CQ @e ( 2')ds I
line integrals by their total length and double integrals (over regions)
Length Circle-Bound
(8) by their area.
(iii) The image potentials have contributions which attempt to
The minimization is done by steepest descent of the energy func-
minimize the total brightness inside the circle divided by its area,
tion in parameter space. It is assumed that preprocessing, or inter-
actions between different templates (see section (7)), will allow the
5
E; = A rea 1 ~ ; r c ~ e - ~ p@e , a( i)dA
and maximize it between the circle and the parabolae (again divided
(9) eye-template to start relatively near the correct position. In some
situations several different templates may be required.
by the area), Thus the update rule for a parameter, for example r, is given by
(iv) The peak potentials, evaluated at the two peak points, are These terms are explicitly calculated in Yuille, Cohen and Halli-
given by nan (1988).
Ep = ca{@(% + plei) + a(&+Pa&)}, (11)
106
coefficients. The templates usually converged t o the eye provided 5 Extensions and Future Work
they were started a t or below it. The valleys from the eyebrows
YuiUe, Cohen and Hallinan (1988) describes how this work can be
caused problems if the template was started above the eye.
extended t o detect mouths. We define a parameterized template for
The values of the coefficients changed automatically during the
the mouth and allow it t o adjust itself to the image, see figure 3.
course of the program to define six distinct epochs:
(i) T h e coefficients of the valley forces are strong and the coef-
ficients of the peak, edge and intensity forces are zero. During this
epoch the valley forces pull the template to the eye.
(ii) T h e coefficients of the intensity forces for the circle are in-
creased. This helps scale the circle to the correct size of the iris.
(is) T h e edge coefficients for the boundary of the circle increase.
This fine tunes the size of the circle as it locks onto the iris.
(iv) T h e peak coefficients increase. This enables the peak forces
t o rotate the template and get the correct orientation.
(v) T h e coefficients of the intensity forces for the whites of the
eyes are increased. This helps adjust the size of the outer boundary
of the template.
(vi) T h e coefficients of the edges of the boundary are increased.
This fine tunes the positions of the boundaries.
107
Our strategy for the Implementation was to use preprocessing finite set of parameters. This can then be related t o the image by the
t o set the initial values of the template parameters. An alternative image irradiance equation. There will then (usually) be a sufficient
method would be t o start several deformable templates off in parallel number of equations to solve for the parameters.
and see which gives the best results. This would require some criteria The facial features detected by deformable templates can be used
for selecting the best fit. A natural choice would be the one with the as inputs t o a recognition system. Another interesting application
lowest final energy function. This, however, might need to be s u p would be to use them as &puts t o a shape from symmetry scheme
plemented by taking into account the spatial relationships to. other (Gordon 1988) t o detect the orientation of a face.
features and the a priori probability of the final parameter values. In
some special cases it may be possible for the energy t o be low but for
6 Conclusion
the parameter values t o be extremely unlikely. Such a situation can
occur if the mouth templates gets started on the eye and becomes A serious problem for detection of edges, or other feature, seems t o
grotesquely deformed (Yuille, Cohen and Hallinan 1988). lie in combining local information, which may be easily obtained,
Interactions between templates may also be necessary for detec- into a global structure. Snakes (Kass, Witkin, Terzopoulos, and
tion. The features of the face are constrained to have certain spatial Terzopoulos, Witkin, Kass) provide an elegant way of linking local
relationships with each other, and this should affect the detection. information to form edges, of providing a priori knowledge about
These forces might be mediated by springs. Moreover, once a feature the likely structure of an edge. For the purpose of detecting facial
is detected the potential fields corresponding to it can be removed, features, however, a lot more a priori information is available and a
thereby making it easier to detect the remaining features. For ex- deformable template is able t o capture it. Moreover, such templates
ample, once the eyebrows are detected removing the valley fields are not only able t o detect a feature but can also provide a description
associated with them would make it easier to detect the eyes. of it for classification and matching to a data base.
Minimizing the cost function can be thought of in terms of ex-
tremizing a probability distribution a la Bayes, in the spirit of the
7 Acknowledgements
Clifford-Hammersley (1964) theorem. The form of the template gives
an a priori expectation for the structure of the feature, and the edge, A.L.Y. would like to thank the Brown, Harvard and M.I.T. Center for
valley and peak fields correspond t o the probability of the edges, val- Intelligent ControI Systems for an United States Army Research Of-
leys and peaks given the template. fice grant number DAAL03-86C-0171. We would also like t o thank
Deformable templates seem t o have a large number of possible Roger Brockett for his support. Conversations with Jim Clark, David
applications. Nitzberg (1988) has used them for detecting triple Mumford, Petros Mangos, Mark Nitzberg, Gaile Gordon and Roger
points in an image. Another possibility is to use them for percep- Brockett were extremely useful.
tual grouping; a set of these templates (capable of describing many
References
salient shapes) could interact with the image and those with the
Burr, D.J. ‘A Dymanic Model for Image Registration”. Com-
best matches (least energy) would be chosen t o order the image.
puter Graphics and Image Processing. 15, pp 102-112. 1981.
T h e visual system would ”hypothesize” many different structures,
Burr, D.J. “Elastic Matching of Line Drawings”. I E E E Trans.
allow them t o interact with the image and then choose the best.
Pattern Analysis and Machine Intelligence. PAMI-3, No. 6, pp 708-
It is unclear, however, how many templates would be needed, how
713. 1981.
many different starting points in the image and how computation-
Durbin, R and Willshaw, D.J. “An analogue approach to the
ally intensive this procedure would be. A second possibility is to use
travelling salesman problem using an elastic net method”. Nature
deformable templates to describe the three dimensional surfaces and
1987.
allow the reflectance function t o be specified by a finite number of
Durbin, R., Szeliski, R. and Yuille, A.L. In preparation.
parameters (allowing for possible directions of the light source, dif-
Fischler, M.A. and Elschlager, R.A.. ‘The representation and
ferent types of reflectance, etc). Suppose, for example, that we have
matching of pictorial structures”. IEEE. Trans. Computers. Vol22.
a deformable template representing the three-dimensional geometry
1. 1973.
of the nose. The reflectance function might also be specified by a
Gordon, G. “Shape from symmetry”. Submitted t o CVPR. 1988.
finite set of parameters (allowing Lambertian plus specularity). The
Hammersley, J.M. and Handscomb, D.C. Monte-Carlo Meth-
geometry of the nose and its reflectance will then be described by a
ods. Methuen and Company. London. 1964.
108
Kanade, T. Computer recognition of human faces. Birkhauser
Verlag. Base1 and Stuttgart. 1977.
Kass, M., Witkin, A. and Terzopoulos, D. “Snakes: Active Con-
tour Models”. Proc. First International Conference on Computer
Vision. London. June 1987.
Maragos, P. ‘Tutorial on Advances in Morphological Image Pro-
cessing and Analysis,” Optical Engineering, vol. 26, pp. 623-632,
July 1987.
Nitsberg, M. “Triple point detection”. Submitted to CVPR.
1988.
Pentland, A. ‘Recognition by Parts”. ICCV. London. 1987.
Serra, J. Image Analysis and Mathematical Morphology, NY: Acad.
Press, 1982.
Tersopoulos, D., Witkin, A., and Kass, M. ‘Symmetry-seeking
models for 3D Object Recognition”. Proc. First International Con-
ference on Computer Vision. London. June 1987.
Yuille, A.L., Cohen, D.S.and Hallinan, P.W. “Facial feature ex-
traction by deformable templates”. Haruard Robotics Lab. Tech.
I09