You are on page 1of 6

Feature extraction from faces using deformable templates

Alan L. Yuille, David S. Cohen and Peter W. Hallinan.

Harvard Robotics Laboratory, Division of Applied Sciences, Harvard University.

tracting the template t o salient features, such w peaks and valleys


in the image intensity, edges (edges alone seem insufficient), and the
Abstract intensity itself. T h e minimum of the energy function corresponds
We propose a method for detecting and describing features of t o the best fit with the image. T h e parameters of the template are
faces using deformable templates. The feature of interest, an eye then updated by steepest descent. This corresponds t o following
for example, is described by a Parameterized template. An energy a path in parameter space, and contrasts with traditional methods
function is defined which links edges, peaks and valleys in the image of template matching which would involve sampling the parameter
intensity to corresponding properties of the template. The template space to find the best match (and would be very expensive compu-
then interacts dynamically with the image, by altering its parameter
tationally). Changing these parameters corresponds t o altering the
values to minimize the energy function, thereby deforming itself to
position, orientation, size, and other properties of the template. The
find the best fit. The final parameter values can be used as descrip
initial values of the parameters, which may be very different from
tors for the feature. We illustrate this method by showing deformable
the tinal values, are determined by preprocessing.
templates detecting eyes and mouths in real images.
These deformable templates have some similarities with elastic
deformable models (Burr 1981a, 1981b, Durbin and Willshaw 1987,
Durbin, Szeliski and Yuille 1988) and to snakes (Kass, Witkin and
1 Introduction Terropolous 1987, Terzopolous, Witkin and Kass 1987). There is
a n important difference, however. All the elastic models have forces
The ability to detect and describe salient features is an important
which interact with the image and other forces that prevent the
component of a face recognition system. Such features include the
eyes, nose, mouth, and eyebrows. This task is hard despite pioneer- structure from deforming too much. For these models, for example
for snakes, the structure forces are only local (although they mini-
ing work by Kanade (1977) and much research on edge detection
mize global quantities such as the squared curvature and length of
and image segmentation. Current edge detectors, for example, seem
the entire curve). This means that effects on one end of the snake
unable t o reliably find features such as the boundary of the eye. The
take time to propagate to the other end (snakes are too ‘floppy” for
problem seems to be that although it is often straightforward to find
our purpose). In contrast, the structure forces on the deformable
local evidence for edges, it is hard t o organize this local information
templates are global (as large as the template) and the interactions
into a sensible global percept.
are long range. Moreover deformable templates, unlike snakes, only
We propose a new method t o detect such features by using de-
involve a finite number of parameters, which can then be used t o
formable templates. These templates are specified by a set of param-
give a compact description of the feature. Indeed objects like snakes
eters which enables a priori knowledge about the expected shape of
can be thought of as deformable templates in the limit as the num-
the features to guide the detection process. The templates are flex-
ber of parameters goes to infinity. Snakes are not well suited for our
ible enough to be able to change their size, and other parameter
task since: (i) they do not take into account the specific a priori
values, so as to match themselves to the data. The final values of
knowledge available, (ii) they have more parameters t o update and
these parameters can be used to describe the features. T h e method
hence are computationslly slower, and (iii) they do not account for
should work despite variations in scale, tilt and rotation of head,
interactions over a region.
and lighting conditions. Variations of the parameters should allow
Our work also is related to Pentland’s method (1987) of repre-
the template to fit any normal instance of the feature.
senting geometric structures in terms of parameterized models and
T h e deformable templates interact with the image in a dynamic
fitting them to depth data by least squares techniques. Fischler and
manner. An energy function is defined which contains terms at-

104
CH2752-4/89/OOOO/0104$01.OO 0 1989 IEEE
Elschlager (1973) use a related method for matching templates by which make the width 2b of the eye roughly four times the radius r

continuous deformations. of the iris, and (iii) forces which encourage the centers of the whites
of the eyes t o be roughly midway from the center of the eye t o the
boundary.
2 Preprocessing
T h e template is illustrated in figure 1. I t has a total of eleven
T h e deformable templates act on three representations of the image, parameters; &, z'=, p1, pz, r, a, b, c and 6'. All of these are allowed
a s well as on the image itself. These representations are chosen t o vary during the matching.
t o extract properties of the image, such a s peaks and valleys in the
image intensity and places where the image intensity changes quickly.
An additional representation could be added to describe textural
properties. An advantage of using these representations is that the
templates need only be specified in simple terms. For example we do
not need to specify the intensity values on the iris, merely that the
iris is a valley in the image intensity. Another advantage of using
these representations is that they enable long range interactions to
occur.
These representations do not have to be very precise, and they
Figure 1.See text
can be calculated fairly simply. Our present methods involve using
T o give the explicit representation for the boundary we first define
morphological filters ( M a n g o s 1987, Serra 1982) to extract these
two unit vectors
features. The fields are smoothed to ensure long range interactions,
for details see Yuille, Cohen and Hallinan (1988).
e; = (cos6',sin6') ( 1)
and
3 The Eye Template
After some experimentation and informal psychophysics on the salience & = (-8in0,COS6'), (2)

of different features of eyes we decided that the template should con- which change as the orientation of the eye A point in
sist of the following features: space can be represented by (z1,zz) where
(1) A circle of radius r, centered on a point &. This corresponds
t o the boundary between the iris and the whites of the eye and is
z'= Zle; + zz&. (3)
attracted to edges in the image intensity. The interior of the circle
Using these coordinates the top half of the boundary can be rep-
is attracted to valleys, or low values, in the image intensity.
resented by a section of a parabola with zlc [-b, b ]
(2) A bounding contour of the eye attracted to edges. This con-
tour is modelled by two parabolic sections representing the upper
2.2 =a- QZZ. (4)
and lower parts of the boundary. It has a center Ze, width 2 b , maxi- b2
mum height a of the boundary above the center, maximum height c Note that the maximal height, z2, of the parabola is 4 and the
of the boundary below the center, and an angle of orientation 6'. height is zero a t ZI = +b. Similarly the lower half of the boundary
(3) Two points, corresponding to the centers of the whites of the is given by
eyes, which are attracted to peaks in the image intensity. These
points are labelled by & + pl(cosO,sin6') and ?e + pz(cosO,sin6'), 22 = -c + ,z:; (5)
where p1 2 0 and pz 5 0. The point z', lies a t the center of the eye where z1c [-b, b ] .
and 6' corresponds to the orientation of the eye.
3.1 The Energy Function for the Eye Template
(4) The regions between the bounding contour and the iris also
correspond to the whites of the eyes. They will be attracted t o large We now define a potential energy function for the image which will
be minimized as a function of the parameters of the template. This
values in the image intensity.
energy function not only ensures that the algorithm will converge,
These components are linked together by three types of forces:
by acting as a Lyaponov function, but also gives a measure of the
(i) forces which encourage 2', and 2, to be close together, (ii) forces
goodness of fit of the template.

105
T h e complete energy function Eo(&,f, PI,m ,a, 6,e, r, 0) is given force, the orientation by the peak force, and the fine scale detail by
as a combination of terms due t o valley, edge, peak, image and in- the edge and intensity forces. In this scenario the values of the c's
ternal potentials. More precisely, will be changed dynamically. Typical values for the coefficients are
( ~ 1c ,z , c 3 , c 4 , c 5 , CO) m (4000,50,50,125, 150,50) and ( k l , kz, k3) m
Ec = E, + E e + Ei + Ep + Einternafi (6) (10,1,0.05).
T h e individual energy terms can be written as functions of the
where: (i) The valley potentials are given by the integral over the
parameter values. For example, the sum over the boundary can be
interior of the circle divided by the area of the circle,
expressed as an integral function of &, a, b, c and B by

=
c1
-Area 1 @" (W.4
(ii) T h e edge potentials are given by the integrals over the bound-
(7)

aries of the circle divided by its length and over the parabolae divided
by their lengths,
where s corresponds to the arc length of the curve and Length t o its
total length. Note that scale independence is achieved by dividing
E==-"-/" cJe(?)ds- CQ @e ( 2')ds I
line integrals by their total length and double integrals (over regions)
Length Circle-Bound
(8) by their area.
(iii) The image potentials have contributions which attempt to
The minimization is done by steepest descent of the energy func-
minimize the total brightness inside the circle divided by its area,
tion in parameter space. It is assumed that preprocessing, or inter-
actions between different templates (see section (7)), will allow the
5
E; = A rea 1 ~ ; r c ~ e - ~ p@e , a( i)dA
and maximize it between the circle and the parabolae (again divided
(9) eye-template to start relatively near the correct position. In some
situations several different templates may be required.
by the area), Thus the update rule for a parameter, for example r, is given by

(iv) The peak potentials, evaluated at the two peak points, are These terms are explicitly calculated in Yuille, Cohen and Halli-
given by nan (1988).
Ep = ca{@(% + plei) + a(&+Pa&)}, (11)

(v) T h e internal potentials are given by 4 Simulation results for eyes


This theory was tested on real images using a SUN4 computer. T h e
ki k k 1 k
=
Einternal -2
(&- 2',)2+ f (PI- 2A{ r+b})2+ -2 2 r+ b } ) 2 + 3
2 (p2+ -{ 2 ( b - -2rl2. valleys, peaks and edges are first extracted and smoothed. T h e tem-
(12) plate is then given initial parameter values, positioned in the image
T h e {ci} and {k;} are usually fixed coefficients but we will allow and allowed t o deform itself using the update equations.
them t o change values (corresponding to different epochs) as the Some initial experimentation was needed to find good values for
process proceeds. Changing the values of these coefficients enable us the coefficients and a number of problems arose. For example, the
to use a matching strategy in which different parts of the template intensity and valley terms over the circle attempt t o find the maxi-
guide the matching a t different stages. For example, the valley in mum value of the potential terms averaged inside the circle. This led
the image intensity corresponding to the iris is very salient and is t o the circle shrinking to a point a t the darkest part of the iris. This
more effective a t "attracting" the template from long distances than effect could be countered by strengthening the edge terms, which pull
any other feature. Thus its strength, which is proportional to c1, the circle out t o the edge between the iris and the whites of the eye.
should initially be large. Orienting the template correctly is usually Another problem arose because the iris might also be partially hid-
best performed by the peak terms, thus CO should be large in the den by the boundary of the eye, thus the part of the circle outside
middle period. The constants c2 and CQ can then be increased to the boundary cannot be allowed to interact with the image. This
help find the edges. Finally, the terms involving the image intensity can be dealt with by only considering the area of the circle inside
can be used t o make fine scale corrections. This corresponds to a the bounding parabolae.
strategy in which the position of the eye is mainly found by the valley The system worked well after good values were found for the

106
coefficients. The templates usually converged t o the eye provided 5 Extensions and Future Work
they were started a t or below it. The valleys from the eyebrows
YuiUe, Cohen and Hallinan (1988) describes how this work can be
caused problems if the template was started above the eye.
extended t o detect mouths. We define a parameterized template for
The values of the coefficients changed automatically during the
the mouth and allow it t o adjust itself to the image, see figure 3.
course of the program to define six distinct epochs:
(i) T h e coefficients of the valley forces are strong and the coef-
ficients of the peak, edge and intensity forces are zero. During this
epoch the valley forces pull the template to the eye.
(ii) T h e coefficients of the intensity forces for the circle are in-
creased. This helps scale the circle to the correct size of the iris.
(is) T h e edge coefficients for the boundary of the circle increase.
This fine tunes the size of the circle as it locks onto the iris.
(iv) T h e peak coefficients increase. This enables the peak forces
t o rotate the template and get the correct orientation.
(v) T h e coefficients of the intensity forces for the whites of the
eyes are increased. This helps adjust the size of the outer boundary
of the template.
(vi) T h e coefficients of the edges of the boundary are increased.
This fine tunes the positions of the boundaries.

Figure 3. A dynamic sequence for the mouth-open template o n


an open and a closed mouth. In the upper picture the template is
pulled in by the peak and edge forces from the teeth. In the lower
picture the template is mainly pulled in b y the valley forces and the
region for the teeth vanishes.
It seems relatively straightforward t o find templates for the other
"internal" features of the face, such as eyebrows, noses, chins and
moustaches. It is less clear how t o generalize this idea to find 'ex-
Figure 2 . A dynamic sequence for the eye left to right and top
ternal" features such as the ears or hair, or t o find internal regions
to bottom. The first frame showa the initial configuration and the
such as the forehead or the cheeks. However, Identikit programs used
remaning frames show the results a t the ends of the epochs.
by police forces are able t o represent a large variety of faces by using
T h e programs changes epoch automatically when it has reached
a comparatively small number of templates (120 eyes, for example).
a steady state of the energy function with the appropriate coefficient
Such programs should be able t o guide us in the search for reliable
values (i.e. when it thinks it has accomplished its goals for that
ways to parameterize features.
epoch).
In some situations it may be necessary to use several different
Figure 2 illustrates the program running in the different epochs.
templates for the same feature. The edge, valley and peak fields
Note that the template can start some distance away from the eye,
are quite different if the eye is partially in shadow. This might be
can scale the iris, rotate the eye and lock onto the edges.
tackled by using an eye-shadow template in which part of the eye is
T h e runtime for the program is between five and ten minutes on
assumed t o be shaded.
a SUN4

107
Our strategy for the Implementation was to use preprocessing finite set of parameters. This can then be related t o the image by the
t o set the initial values of the template parameters. An alternative image irradiance equation. There will then (usually) be a sufficient
method would be t o start several deformable templates off in parallel number of equations to solve for the parameters.
and see which gives the best results. This would require some criteria The facial features detected by deformable templates can be used
for selecting the best fit. A natural choice would be the one with the as inputs t o a recognition system. Another interesting application
lowest final energy function. This, however, might need to be s u p would be to use them as &puts t o a shape from symmetry scheme
plemented by taking into account the spatial relationships to. other (Gordon 1988) t o detect the orientation of a face.
features and the a priori probability of the final parameter values. In
some special cases it may be possible for the energy t o be low but for
6 Conclusion
the parameter values t o be extremely unlikely. Such a situation can
occur if the mouth templates gets started on the eye and becomes A serious problem for detection of edges, or other feature, seems t o
grotesquely deformed (Yuille, Cohen and Hallinan 1988). lie in combining local information, which may be easily obtained,
Interactions between templates may also be necessary for detec- into a global structure. Snakes (Kass, Witkin, Terzopoulos, and
tion. The features of the face are constrained to have certain spatial Terzopoulos, Witkin, Kass) provide an elegant way of linking local
relationships with each other, and this should affect the detection. information to form edges, of providing a priori knowledge about
These forces might be mediated by springs. Moreover, once a feature the likely structure of an edge. For the purpose of detecting facial
is detected the potential fields corresponding to it can be removed, features, however, a lot more a priori information is available and a
thereby making it easier to detect the remaining features. For ex- deformable template is able t o capture it. Moreover, such templates
ample, once the eyebrows are detected removing the valley fields are not only able t o detect a feature but can also provide a description
associated with them would make it easier to detect the eyes. of it for classification and matching to a data base.
Minimizing the cost function can be thought of in terms of ex-
tremizing a probability distribution a la Bayes, in the spirit of the
7 Acknowledgements
Clifford-Hammersley (1964) theorem. The form of the template gives
an a priori expectation for the structure of the feature, and the edge, A.L.Y. would like to thank the Brown, Harvard and M.I.T. Center for
valley and peak fields correspond t o the probability of the edges, val- Intelligent ControI Systems for an United States Army Research Of-
leys and peaks given the template. fice grant number DAAL03-86C-0171. We would also like t o thank
Deformable templates seem t o have a large number of possible Roger Brockett for his support. Conversations with Jim Clark, David
applications. Nitzberg (1988) has used them for detecting triple Mumford, Petros Mangos, Mark Nitzberg, Gaile Gordon and Roger
points in an image. Another possibility is to use them for percep- Brockett were extremely useful.
tual grouping; a set of these templates (capable of describing many
References
salient shapes) could interact with the image and those with the
Burr, D.J. ‘A Dymanic Model for Image Registration”. Com-
best matches (least energy) would be chosen t o order the image.
puter Graphics and Image Processing. 15, pp 102-112. 1981.
T h e visual system would ”hypothesize” many different structures,
Burr, D.J. “Elastic Matching of Line Drawings”. I E E E Trans.
allow them t o interact with the image and then choose the best.
Pattern Analysis and Machine Intelligence. PAMI-3, No. 6, pp 708-
It is unclear, however, how many templates would be needed, how
713. 1981.
many different starting points in the image and how computation-
Durbin, R and Willshaw, D.J. “An analogue approach to the
ally intensive this procedure would be. A second possibility is to use
travelling salesman problem using an elastic net method”. Nature
deformable templates to describe the three dimensional surfaces and
1987.
allow the reflectance function t o be specified by a finite number of
Durbin, R., Szeliski, R. and Yuille, A.L. In preparation.
parameters (allowing for possible directions of the light source, dif-
Fischler, M.A. and Elschlager, R.A.. ‘The representation and
ferent types of reflectance, etc). Suppose, for example, that we have
matching of pictorial structures”. IEEE. Trans. Computers. Vol22.
a deformable template representing the three-dimensional geometry
1. 1973.
of the nose. The reflectance function might also be specified by a
Gordon, G. “Shape from symmetry”. Submitted t o CVPR. 1988.
finite set of parameters (allowing Lambertian plus specularity). The
Hammersley, J.M. and Handscomb, D.C. Monte-Carlo Meth-
geometry of the nose and its reflectance will then be described by a
ods. Methuen and Company. London. 1964.

108
Kanade, T. Computer recognition of human faces. Birkhauser
Verlag. Base1 and Stuttgart. 1977.
Kass, M., Witkin, A. and Terzopoulos, D. “Snakes: Active Con-
tour Models”. Proc. First International Conference on Computer
Vision. London. June 1987.
Maragos, P. ‘Tutorial on Advances in Morphological Image Pro-
cessing and Analysis,” Optical Engineering, vol. 26, pp. 623-632,
July 1987.
Nitsberg, M. “Triple point detection”. Submitted to CVPR.
1988.
Pentland, A. ‘Recognition by Parts”. ICCV. London. 1987.
Serra, J. Image Analysis and Mathematical Morphology, NY: Acad.
Press, 1982.
Tersopoulos, D., Witkin, A., and Kass, M. ‘Symmetry-seeking
models for 3D Object Recognition”. Proc. First International Con-
ference on Computer Vision. London. June 1987.
Yuille, A.L., Cohen, D.S.and Hallinan, P.W. “Facial feature ex-
traction by deformable templates”. Haruard Robotics Lab. Tech.

Rep 88-2. 1988

I09

You might also like