You are on page 1of 19

An Introduction'to Computing

with Neural Nets


Richard P. Lippmann

Abstract using massively parallel nets composed of many computa-

4
MAGAZINE
APRIL
ASSPIEEE 1987 .OO 01987 IEEE
0740-7467/87/0400-0004/$01
problemsandstudyingrealbiologicalnets mayalso static input pattern containing 'N input elements. In a
change the way we think about problems and lead to new speech .recognizer the inputs might be the output en-'
insights and algorithmic improvements. velopevalues from a filter bank spectral analyzer sampled
Work on artificial neural net models has a long history. at one time instant and theclasses might represent differ:
Developmentofdetailedmathematicalmodelsbegan ent vowels. In an image classifier the inputs might be the
more than 40 years ago with.the work of McCulloch and gray scale level of each pixel for a picture and theclasses
Pitts [30], Hebb [17], Rosenblatt [39], Widrow [471 and might represent different objects.
others [381.. More recent workbyHopfield [18,19,201., The traditional classifierin the top of Fig. 2 contains two
Rumelhart and McClelland [40], Sejnowski [431, Feldman stages. The first computes matchingscores for each class
191, Grossberg [151, and others has led to a new resurgenceand the second selects the class with the maximumscore.
of the field. This new interest is due to the development Inputs to the first stage are symbols representingvalues of
of new net topologies and algorithms [18,19,20,41,91, the N input elements. These symbols Are entered sequen-
newanalog VLSl implementationtechniques [31], and tially and decoded from the external symbolic form into
some intriguing demonstrations [43,201as well as by a an internalrepresentationusefulforperforming,arith-
growing fascination with the functioning of the human metic and symbolic operations. An algorithm computes a
brain. Recent interest is also driven by the realization that matching score for each of the M classes which indicates
human-like performance in the areas of speech and image how closely the input matches the exemplar pattern for
recognition will require enormous amounts of processing. each class. This exemplar pattern is that pattern which is
Neural nets provide one technique for obtaining the re- most representative of each class. In many situations a
quired processing capacity using large numbers.of simple probabilistic model is used to model the generation of
processing elements operating in parallel. input patterns from exemplars and the matching score
This paper provides an introduction to the field of represents the likelihood or probability that the input
neural nets by reviewing six important neural net models pattern was generated fromeach of the M possible exem-
that can be used for pattern classification.These,massively plars. In those cases, strongassumptions are typically
parallel nets are important building blocks which'can be made concerning underlying distributions of the input
used to construct more complex systems. The main pur- elements. Parameters of distributibns can then be esti-
pose of this reviewis t o describe the purpose and design matedusingatrainingdata,asshownin Fig.2. Multi-
of each net in detail, to relate each net to existing pattern variate Gaussian distributions are often used leading to
classification and clustering algorithms that are normally relatively simple algorithms for computing matching
implemented on sequentialvonNeumanncomputers, scores [7]. Matchingscores are coded into symbolic repre-
and to illustrate design principles used to obtain parallel- sentations and passed sequentially to the second stage of
ism using neural-like processing elements.
Neural net and traditional classifiers PARAMETERS ESTIMATED
FROM TRAINING DATA
Block diagrams of traditional and neural net classifiers I
are presented in Fig. 2. Both types ofclassifiers determine
which of M classes is most representative of an unknown
lNPUT
SVMSOLS

AI TRADITIONAL CLASSIFIER

INPUT OUTPUT FOR


ANALOG ONLV ONE

Figure 2. Block diagrams of traditional (AI andneure


net (81 classifier's. Inputs and outputs of the traditione
classifier are passed serially and internal computations
are performed sequentially. In addition, parameters are
typically estimated from training data and then tield con.
stant. Inputs and outputs t o the neural net classifierare
in parallel and internalcomputationsare performed ir
parallel. Internalparameters or weightsaretypicall\
adapted or trained during use using the outputvalues anc
labels specifying
, -
I .the correctclass.
, ,

APRIL 1987 IEEE ASSP MAOAZINE 5


based pre-processors modeled after human sensory sys- ber of nodes available in the first stage.
tems. A;.pre-processor for image class.ification modeled
after theretina and designed using analog VLSl circuitry is
described in [311. Pre-processor filter banks for speech
recognition thatare crude analogs of the cochlea have also
been constructed [34, '291. More .recent physiologically-

6 IEEE ASSP 'MAGAZINE APRIL ,987


the matching score outputs of the lower subnet tosettle
and initialize the outputvalues of the MAXNET. The input
is then removed and theMAXNET iterates until the output
of only one node is positive. Classification is then com;
plete and the selected class is that corresponding to the
node wit,h a positive output.
The behavior of the Hamming netis illustrated inFig. 7.

APRIL 1987 IEEE ASSP


MAGAZINE 9
, ,

.Figure 7. Node outputs for a Hamming net with 1,000


'

binary inputs and 100 output nodes or classe,s. Output


values of all 100 nodes are presented at time zero and
after 3, 6, and 9. iterations. The input was the exemplar
pattern corresponding t o output node 50.

only 1,100 connections while the Hopfield net requires maximum of M inputs. A net that uses these subnet.s to
almost 10,000. Furthermore, the difference in number of pick the maximum of 8 inputs is presented in Fig. 9.
connections required increases as the number of inputs
increases, because the number of connections in the Hop-

x() x, xp xg x4 x5 xg xb ,'.,

Figure 9. A feed-forward net that determines which of:


eight inputs ismaximum using a binary tree and comparaj.
tor subnets from Fig. 8. After an input vector is applied;',
only that outputcorresponding to themaximum input elei
ment will be 'high. Internal thresholds on threshold-logic
nodes [open circles1 and on hard limiting nodes [filled cir-
cles) are zero except for the outputnodes. Thresholds in
the output nodes are 2.5. Weights for the comparator
subnets are as in Fig. 8 and all other weights are 1.
APRIL 1987 IEEE ASSP MAGAZINE 11
12 ' IEEE ASSP
MAGAZINE
APRIL 1987
APRIL 1987 IEEE.
ASSP
MAGAZINE 13
14 IEEE
ASSP
MAGAZINE
APRIL 1987
variant of the perceptron convergence procedure to apply and are heavily skewed and non-Gaussian. The Gaussian
for M classes. This requires a structure identical to the classifier makes strongassumptionsconcerningunder-
Hamming Net and a classification rule that selects .th.e 1yin.gdistributions and is more appropriate when distribu-
class corresponding to the node with the maximum out- tions~are known and match the Gaussian assumption. The
put. During adaptation the desired output values can be adaptationalgorithmdefinedbytheperceptroncon-
set to 1 for the correct class and 0 for all others. vergence procedure is simple to implement and doesnt
require storing any more information than is present in
the weights and the threshold. The Gaussian classifier can
bemadeadaptive [24], but extrainformationmustbe
stored and the computations requiredare more complex.
Neither the perceptron convergence procedure nor the
Gaussian classifier is appropriate when classes cannot be
separatedbyahyperplane.Twosuchsituationsare
presented in the upper section of Fig.14. The smooth
closed contours labeled A and B in this figure are the input
distributions for the two classes when there are two con-
tinuous valued inputs to the different nets. The shaded
areas are the decision regions created by a single-layer
perceptron and other feed-forwardnets. Distributions for
the two classes for the exclusive OR problem are disjoint
and cannot be separated by a single straight lirle. This
problem was used to illustrate the weakness of the per-
ceptronbyMinskyandPapert[32].Ifthelowerleft
B cluster is taken to be at the origin of this two dimen-
sional space then the output of the classifier must be
high only if one but not both of the inputs is high.
One possible decision region for clasS A which a percep-
tron might create is illustrated by the shaded region in
the first row of Fig. 14. Input distributions for the second
problem shown in this figure are meshed andalso can not
be separated by a single straight line. Situations similarto
these may occur when parameters such as formant fre-
quencies are used for speech recognition.

The perceptronstructure can be used toimplement


either a Gaussian maximum likelihood classifier or clas-
sifiers which use the perceptron training algorithm or one
of itsvariants. The choice depends on the application. The
perceptron training algorithmmakes no assumptions con-
cerning the shape of underlying distributions butfocuses
on errors that occur where distributions overlap. It may
thus be more robust than classical techniques and work
well when inputs are generated by nonlinear processes
16 NEE AS'GP MAGAZINE APRIL 1987
APRIL 1987 IEEE ASSP MAGAZINE 17
18 IEEE ASSP MAGAZINE APRIL 1987
topic organization in the auditory pathway extends up to puts to each output are identical) then the node with the
the auditory cortex [33,211. Although much of the low- minimumEuclideandistance can befoundbyusing
level organization is genetically pre-determined, it is likely the net of Fig.17 to form the dot product of the input
that some of the organization at highe.r levels is created and the weights. The selection required in step 4 then
during learning by algorithms which promote self- turns into a problem of finding the node with a maximum
organization. Kohonen [22] presents one such algorithm value. This node can be selected using extensive lateral
which produces whathe calls self-organizing feature maps inhibition as in theMAXNET in the top ofFig. 6. Once this
similar to those that occur in the brain. node is selected, weights to it and to other nodes in its
Kohonensalgorithmcreates a vector quantizer by neighborhood are modified to make these nodes more
adjusting weights from common input nodes to M output responsive to the current input. This process is repeated
nodes arranged in a two dimensional grid as shown in for further inputs. Weights eventually converge and are
Fig. 17. Output nodes are extensively interconnected with fixed after the gain term in step 5 is reduced to zero.
many local connections. Continuous-valued input vectors
are presented sequentially in time without specifying the
desiredoutput.Afterenoughinputvectors have been
presented, weights will specify cluster or vector centers
that sample the input space such that the point density
function of the vector centers tends to approximate the
probability density function of the input vectors [221. In
addition, the weights will be organized such that topo-
logically close nodes are sensitive to inputs thatare physi-
callysimilar. Output nodeswillthusbeorderedin a
natural manner; This may be important in complex sys-
tems with manylayers of processingbecause it can reduce
lengths of inter-layer connections.
The algorithm that forms featuremaps requires a neigh-
borhood to be defined around each node as shown in
Fig. 18. This neighborhood slowly decreases in size with
time as shown. Kohonens algorithm is described in Box 7.
Weights between input and output nodesare initially set
to small random values and an input is presented. The
distance between the input and all nodesis computed as
shown. If the weight vectors are normalized to have con-
stant length (the sum of the squared weights from all in-

An example of the behavior of this algorithm is


presented inFig. 19. The weights for100 output nodes are
plotted in these six subplots when there are two random
independent inputs uniformly distributedover the region
enclosed by the boxed areas. Line intersections in these
plots specify weights for one output node. Weights from

APRIL 1987 IEEE


ASSP
MAGAZINE -1 3
ACKNOWLEDGMENTS Proceedings 151, NeuralNetworksfor Computing,
l.have constantly benefited from discussions with Ben Snowbird Utah, AIR 1986.
Gold and JoeTierney. I would also like tothank Don John- [I51 S. Grossberg, The Adaptive Brain I: Cognition, Learn-
son for his encouragethent, Bill Huang for his simulation ing, Reinforcement, and Rhythm, and The Adaptive
studies, and Carolyn for her patience. Brain I/: Vision, Speech, Language, and Motor Con-
trol, Elsevier/NorthiHolland, Amsterdam (1986).
[I61 J.A. Hartigan, Clustering Algorithms, John Wiley &
REFERENCES Sons, New York (1975).
[I] Y. S . Abu-Mostafaand D. Pslatis, OpticalNeural [I71 D. 0 . Hebb, The Organization o f Behavior, John
Computers, Scientific American, 256, 88-95, March Wiley & Sons, New York (1949).
1987. [I81 J.J. Hopfield, Neural Networks andPtiysical Systems
[21 E. B. Baum, J. Moody, and F. Wilczek, Internal Repre- with Emergent CollectiveComputational Abilities,
sentationsforAssociativeMemory, NSF-ITP-86- Proc. Natl. Acad. Sci. USA, Vol. 79, 2554-2558,
138 InstituteforTheoretical Physics, Universityof I April 1982.
California, Santa Barbara, California, 1986. 1191 J. J.Hopfield, Neurons with Grade,d Response Have
[31 G. A. Carpenter, and S. Grossberg, Neural Dynamics Collective Computational PropertiesLike Those of
of CategoryLearningandRecognition:Attention, Two-State Neurons, Proc. Natl. Acad. Sci. USA, Vol.
Memory Consolidation,and Amnesia, in J. Davis,
R. Newburgh,and E. Wegman (Eds.). BrainStruc-
ture,Learning, and Memory, AAAS Symposium
Series, 1986.
M. A. Cohen, and S. Grossberg, Absolute Stability of
Global Pattern Formation and Parallel Memory Stor-
age byCompetitiveNeuralNetworks, I Trans.
Syst. Man Cybern. SMC-13, 815-826, 1983.,
B. Delgutte, Speech-Coding in the AudiioryNerve:
11. .Processing Schemes for Vowel-Like Sounds, ).
Acoust. Soc. Am. 75, 879-886,1984.
J.S. Denker, AIP ConferenceProceedings 151, Neural
Networks for Computing, Snowbird Utah, AIP; 1986.
R. 0 . Duda and P. E. Hart, Pattern Classificafion and
Scene Analysis, John Wiley& Sons, New Yorc(1973).
J.L. Elman and D. Zipser, Learning theHidden Struc-
ture of Speech, Institute forCognitive Science, Uni-
versity of California a t San Diego, ICs Report 8701,
Feb. 1987.
J.A. Feldmanand D.H. Ballard, Connectionist Mod-
els and Their Properties, Cognitive Science, Vol. 6,
205-254, 1982;- ,
,

R. G. Gallager, Information Theory and Reliable Com-


munication, John Wiley & Sons, New York (1968).
0 . Ghitza, Robustness Against Noise: TheRole of
Timing-SynchronyMeasurement,in Proceedings
International Conference on Acoustics Speech a n d
Signal Processing, ICASSP-87, Dallas, Texas, April
1987.
B. Gold,HopfieldModelAppliedto Voweland
Consonant Discrimination, MIT Lincoln Laboratory
Technical Report, TR-747,AD-A169742, June 1986.
H. P. Graf, L. D. Jackel, R. E. Howard, B. Straughn, J. S.
Denker, W. Hubbard, D.M. Tennant, and D. Schwartz,
VLSI Implementation of a Neural, Network Memory
With Several Hundreds of Neurons, in J.S. Denker
(Ed.) AIP Conference Proceedings 157, Neural Net-
works for Computing, Snowbird Utah, AIP, 1986.
P.M. Grant and J. P. Sage, A Comparison of Neural
Network and Matched Filter Processing fRr Detecting
Lines in Images, in J. S. Denker (Ed.) AIP Conference

APRIL 1987 IEEE ASSP


MAGAZINE 21
Univ. Technical Report JHUIEECS-86I01, 1986.
[441 S. Seneff, A Computational Model for thePeripheral
Auditory System: Application to Speech Recognition
Research, in Proceedings International Conference
on Acoustics Speechand SignalProcessing, ICASSP-
86, 4, 37.8.1-37.8.4, 1986.
[451 D. W. Tank and J , 1. Hopfield, Simple Neural,
OptimizationNetworks:An A/D Converter, Signal
Decision Circuit, anda Linear ProgrammingCircuit,
IEEETrans. Circuits Systems CAS-33,533-541,1986.
[461 D. 1. Wallace, Memory and Learning iri?aClass of
h p r a l Models, in B. Bunk and K. H. Mufter,(Eds..)
Proceedings of the Workshop on Lattice Cmge
Theory,Wuppertal, 1985, Plenum (1986).
[471B. Widrow, and M. E. Hoff, Adaptive Switching Cir-
cuits, 1960IREWESCON Conv. Record, Part 4,
96-104, August 1960.
[481 B. Widrow and S. D. Stearns, Adaptive SignalProcess-
ing, Prentice-Hall, New Jersey (1985).

You might also like