Communication Theory and Cybernetics1: (S (T) ) 2 S (t&4t

1954
I/R/E
TRANSACTIONS-CIRCUIT
THEORY
19
Communication
Theory and Cybernetics1

DENIS GABOR
formulated in a special and not rigorous form by Nyquist and by Ktipfmiiller, (1924) and later restated, in a generalized but still not rigorous form by the author, (1946), as follows : In a frequency interval F and in a time interval T any signal can be described by at most 2FT independent data. In other words 2 FT is the degree of freedom of the signal in this time-frequency domain. The proof consisted in showing that any signal could be linearly expanded in terms of certain elementary signals, which did not spread appreciably beyond a unit area in the time-frequency diagram. Later Shannon, (1949), found a special but more rigorous formulation, which as a mathematical theorem goes back to Whittaker, (1915): Whittaker considered a certain interpolation formula for approximating an arbitrary function, s(t), which is obtained as follows. Mark off on the time axis equidistant sampling points, spaced by At = 1/2F, starting, say, with t =O. To every point t, = n/2 F associate a function sin 2rF(t - n/2F) un = 2?rF(t - n/2F) *
OTHER sciences which have grown out of techniques, communication theory is an effort of Id abstraction ,,and of systematization, applied to a vast material which has developed mostly by ad hoc solutions of urgent practical problems, often ingenious, seldom systematic. The obvious danger for such a theory is that in its turn it will become abstract, systematic and sterile, like the theory of painting. It will be one of my main objects to show that, far from being sterile, communication theory is likely to become a powerful driving force in the future development of the techniques of communication and control. What distinguishes communication theory from other applications of mathematics to communication problems is the systematic study and exploitation of prior information, that is of all that we know, or can know in advance of a message or of a set of messages. Prior information is of two kinds. First of all, we must know our receiving instrument. This gives us what is called a priori structural information. The u priori structure is that which has been imparted to the messages by the receiving instrument, which acts like a kind of sieve. Second, we have often, though not always, some previous acquaintance with the source of the messages and its habits. This body of statistical information might enable us to form certain expectations of messages to come. Hence the division of the field into structural and statistical communication theory. The second is the larger and more important chapter, but I want to show that it pays to be well acquainted with the first, as it supplies us with useful tools for solving the problems presented by the statistical theory.
IKE STRUCTURAL COMMUNICATION THEORY
This function, for which Woodward has recently proposed the name of sin? function, is illustrated in Fig. 1. One now obtains what Whittaker has named the cardinal function by forming the series
[s(t)]= 2 s(t&4t>.
n=-ce
The basic problem of structural communication theory is the adequate representation of signals. By adequate I mean that the representation must contain everything that can be observed by a physical receiver, and nothing else. Once it was believed that a sound, for instance, could be represented by some continuous function s(t), a moving picture by some continuous luminosity function L(x, y, t). The first step in structural communication theory is to replace this naive view by what I propose to call the Expansion Theorem. This was first
t General Report presented at the Fourth Session of the Science on Electronics and Television, organized and sponsored at Milan, April 12-17, 1954, by the National Research Council of Italy. * Imperial College of Science and Technology, London, Eng.
Fig. l-The
Sampling Theorem, showing in (a) the Fourier transform of the sin? function illustrated at (b), and illustrating how an arbitrary function in (c) may be approximated by sine components to obtain the cardinal function.
Days: Symposium
The cardinal function assumes at every sampling point t, the same value as s(t), because the un(t) are zero at every sampling point except at t = t,, where their value is unity. Whittaker showed that whatever the Fourier
20
I*R,E
THEORY
December
spectrum of s(t) may be, the spectrum of the cardinal function is zero outside the frequency band between -F and F.l Shannon expressed this in his Sampling Theorem in the form that in a waveband F the signal is completely determined by its samples at intervals l/2 F. The Sampling Theorem, though extremely useful, has the disadvantage that it is based on elementary signals with a rectangular spectrum which cannot be realized with a physical filter. The time-transform of the frequency characteristic of a linear filter is the impulse response, that is to say the response to a unit-impulse or delta-function. Every physical response must be zero before the stimulus occurred, but the sin? signals vanish only at - 00, hence they can be only approximated by real filters with very large delay, making them practically useless. I, therefore, propose to generalize the theorem, and at the same time make it more physical, as follows: Assume a linear filter with a transfer characteristic R(f) which is non-zero. inside a certain frequency band of width F and zero outside it. An observer behind the filter will not notice any difference if an arbitrary signal s(t) is replaced by the generalized cardinal function [s(t)], which is a linear expansion [s(t) 1 = 2 w(t -cc - td, (t, = n/W,
It is also shown in the Appendix that the Expansion Theorem can be expressed in the form of a generalized Sampling Theorem. In the general case, however, it is not the signal s(t) which is sampled but the preemphasized signal which has passed through a filter with a transfer characteristic l/R@). Thus in the following explanations we can use the Sampling Theorem, only with the reservation that in the general case one would have to put preemphasized signal in the place of signal.
Fig. 3-The signal vector in the signal space. The co-ordinates are the signal amplitudes at the sampling points. All analytical operations on the signal can be expressed as algebraic operations on the signal coordinates.
in terms of the impulse response function r(t) of the filter. The proof and the rule for calculating the expansion coefficients c, is given in Appendix I. Fig. 2 is a simple illustration.
Following Hilbert and Shannon, we can now represent any arbitrary signal in a frequency band F and in a time interval T as a point in a Euclidean space of 2FT dimensions, with the expansion coefficients cn as co-ordinates. (See Fig. 3 above.) Strictly speaking, this is correct only if number of dimensions is infinite, but error is small if 2FT is reasonably large. Using the sin? functions as basis, it is legitimate to consider the co-ordinate system as orthogonal and the energy of the signal as proportional to the squared length of the signal vector, beI cause
Fig. P-The Expansion Theorem, showing at (c) how a given signal may be expanded in terms of the impulse responses of a physical filter. A typical impulse response is shown at (b) and the real and imaginary parts of its Fourier transform are shown at (a).
S --03 =csnsn+ S= c c snsm+ n m
-u,(t)um+(t)dt
1 u,u,+dt = 2F c snsn+-. --m
One may object that a physical filter with exactly zero response outside a finite waveband is also impossible. In this case there will remain a certain mean-square difference between the filter response to the original signal and to the cardinal function, which can be made very small for good bandpass filters. One also obtains an estimate of the error committed by limiting the expansion to some finite time interval T instead of extending it to infinity.
* The waveband of the cardinal function is 2 F but as it is real only half this waveband is utilized. One can get rid of this ballast by the device of the analytical signals, introduced by the author in 1946 and much studied by Ville (1948, 1950), whose spectrum is zero for negative frequencies.
For real signals the energy is the sum of the square of the orthogonal signal components; in this case, of the samples. Another orthogonal representation is by Fourier components; one passes from one representation to the other by a rotation of the axes in 2 FT dimensions. Replacing a signal by its expansion coefficients or coordinates has a momentous consequence: At this point, analysis stops and hands over to algebra, because all analytical operations on the signal can be replaced by algebraic operations on the signal coordinates. Consider differentiation with respect to time as an example. The time derivative of the generalized cardinal function is
1954
Gabor: Communication
Theory and Cybernetics

STATISTICAL COMMUNICATION THEORY
21
$ b(Ol = ; c cdt - h) = c @$j(j tk). But as the spectrum of the derivatives ? extends over the same frequency interval as the spectrum of the response functions r, we can expand them in terms of the r, and obtain a representation of the derivative in terms of the r,, with certain new expansion coefficients E,. These are connected with the c, by a linear transformation
with the transformation matrix CY,, =a,,,,,. In the case of sine functions this is, as first shown by Oswald, (1949)) ~mn =-, (-> m-n mfn; a,=o.
In Shannons statistical communication theory the difficulties of the adequate specification of symbols are short-circuited by the assumption that the signals can be recognized by the receiver with certainty, or with certain known probabilities. A further and equally important element in Shannons theory is, that it is not concerned at all. with uncertain recognition by conjecture, but only with that part of the information which can be transmitted with certainty. The fact that certainty can arise out of uncertainty is one of the most surprising and important revelations of the statistical theory. As a measure of information Shannon has adopted R.V.L. Hartleys (1928) definition, that transmission of
information means the establishment of agreement be-
This is illustrated in Fig. 4; also the second derivative. If the operations are nonlinear, the theorem that all operations are algebraic still holds, but it must be interpreted somewhat differently. Nonlinear operations applied to an elementary set r(t-t,) produce new functions, whose spectrum need not be contained in the original waveband. But in every case we are interested, in the end, only in a certain waveband F, which need not be identical with the original F, but is also finite. The final result of the operations results in some signal in F and if this is expanded in terms of the response functions of this band, it is always found that the final co-ordinates are connected with the original ones by a chain of algebraic equations.
tween a transmitter and a receiver regarding certain independent questions, which can all be answered with transa yes or a no. The amount of information mitted is the number of correct yes or no answers, correct meaning that they establish agreement. Each answer counts as one bit, which is short for a binary digit. To give this somewhat academic sounding question a more practical turn, assume that the sender and the receiver agree that they will use the capacity of the communication channel only for independent messages. Such an agreement is called ideal coding. It might appear a very difficult problem how to extract the independent components of a message, but there exists a simple short cut to the solution : consider only very long messages, emitted by a stationary source. Very long messages must become, in the limit, independent of one another, and also in themselves, as regards their parts, because the orthographic, grammatical, logical, etc. chains in every language are bound to decay after some length, so that in the end the interval in which prediction is possible must become vanishingly small relative to the length of the message. Consider long messages, containing N symbols, where iV is a large number. Any such message will contain the n symbols i of the language in very nearly the same proFig. 4-Differentiation of a signal with respect to time as an example portions $i; it is only the order of symbols which disof a simple analytic operation. tinguishes one message from another. The probability p of any such message is then We can sum this up by saying that analysis serves only for setting up the conceptual frame; once this frame is set up, ready to receive facts of experience, all relawhich is the same for all, hence the number of different tions which link observational data with one another messages of this length N is M = l/p. Thus any message must be algebraical. This is a conclusion which probably can be considered as a choice from M possible choices, will be of great importance far beyond the limits of each with the same probability. By how many indestructural communication theory. pendent questions could we have picked out one of the Before leaving this field and passing on to the statistiM messages? The best procedure is to ask questions cal theory, it will be fair to draw attention to the fact such as, Is the number in the upper or in the lower half that between the two there is a sort of no mans land of M? and so on. The number of the questions which which hitherto has not beenlsatisfactorily explored. The must be answered is the integer next highest to log2M, structural theory deals with amplitudes, the statistical and if M is a very large number this will be nearly theory with signs or symbols, and there is a logical gap ,enough 1ogzM. Hence the information, that is the numbetween these which needs filling. A few remarks on this ber of yes or no answers transmitted in this ideal codsubject will be found in Appendix II. ing, is, in the limit
22 1 x log2 M = i log2 l/p = -
I-R/E
THEORY
December
2 pi log2 pi bits/symbol. 1
This is Shannons Coding Theorem for the noiseless channel. The mathematical expression at the right is called the entropy of the source relative to the symbols i. It is a maximum if the symbols have equal probability, in which case it is equal to log,n. As an example, a letter-symbol in a 32-letter alphabet can ideally carry 5 bits of information. On the other hand, Shannon estimates that the letter-entropy in the English language is only about one bit/letter; in other words, the redundancy is about 80 per cent. This is a considerable waste in the noiseless channel; in the noisy channel it is a welcome safeguard, as one can guess almost the whole text if 80 per cent of the letters are missing. Though the ideal code consists of infinitely long messages, whose coding and decoding time is also infinite, the ideal can be sufficiently approximated for most practical purposes by much shorter codes. As Shannon and R. Fano (1949) have shown it is possible to approach the ideal within l/N bits/symbol by a suitably constructed code. It is a somewhat surprising result that it is not necessary to transmit every symbol in order to reconstruct a text completely, but it was even more surprising when Shannon showed that it is possible to transmit information at a certain finite rate through a noisy channel. Nayvely one would have expected that once symbols are made uncertain by noise no certainty is possible, and at best we can make a good guess at what the message was meant to be. In fact this is true for any finite message. It is too often forgotten that Shannons remarkable result applies only to infinite messages. Let i be a symbol transmitted by the sender, which by noise can be distorted so that the received symbol is j. The transmission process is completely described statistically if we know for every symbol-pair i, j the probability p(;, j) of their joint occurrence. From this we can derive the probability with which the signal is sent P(i) = c PC& j> (which we have called previously 9;) and also the probability that the symbol j will be received
With these elements one can construct several entropy-expressions. The first of these is the entropy of the source, which we have met before, and which we now write with Shannon, using the symbol x for the messages as sent, H(x) = - C C P(i, j) 10sC PC& j)
i = $ pii) log p(i)
bits/symbol.
This can be considered as the uncertainty per symbol of the source. The information is just the answer to this uncertainty, and numerically equal to it. Next one can define the conditional entropy, with y as the symbol for the messages as received H,(x) = IX C P(C j) 10s PA9 z i bits/symbol.
This can be considered as the average uncertainty which remains after the messages have been received. It is also called the equivocation. Note that this is zero if there is no noise, because then $(;, j) and pi(;) vanish unless i =j, and for i = j the conditional probability is unity. In his General Coding Theorem Shannon states that even in a noisy channel one can still transmit m4 - f&k41 bits/symbol
P(j) IE J-1. = PC6

We can also construct the conditional j is received when i has been sent probability that PC;, j) PC6 j) Pi(j) = c p(i, j) =--PC4 and its inverse, the probability stands for i PC& j>
Pi(i) = T p(i, j)
that the signal j received =-. P(t j)

P(j)
with any degree of certainty, that is with arbitrarily small error, provided that the coding is ideal, which. in general, means that the messages are coded and decoded in infinite lengths. We will show later, in an instructive example, how certainty can arise from uncertainty by the laws of large numbers. It is important to note that in the noisy channel we can improve the rate of transmission to a point not only by coding, but also by a new process which is called matching. The best matching is the optimum coordination of the symbols emitted by the source, for instance letters, speech sounds, etc. with the symbols used in the transmission, for instance sequences of amplitude values, or of instantaneous frequencies. In the noiseless channel this problem did not exist, because there one transmission symbol was as good as any other, but in the noisy channel if a symbol is more often distorted than another it will be advantageous to use it less often. If one knows the distortion probabilities pi(j), one can calculate the set of $(zJ which makes the equivocation a minimum-though never zero! One can then proceed to match the message symbols with the transmission symbols by searching for the best one-to-one coordination which makes the actual frequencies as nearly as possible equal to the calculated ones. One can consider this as a second coding process. So far we have discussed discrete channels only. The continuous channel with noise does not represent an essentially different problem, while the continuous noiseless channel would, of course, possess infinite capacity, and must therefore be excluded as a nonphysical abstraction.
1954
23
Consider a signal space with a very great number of dimensions. A signal without noise is represented by a definite vector; a signal with noise as some probability distribution, is indicated in Fig. 5. One might now think that no signal can ever be identified with complete certainty, because no noise vector, however large, is impossible; it only becomes very improbable. But this is only because we are not used to thinking in spaces with a very great number of dimensions. In such spaces any one component, say nk, can have the familiar spreading type of probability distribution, but the quadratic mean of all components
where P is the mean signal power and N the mean noise power. Many other, more complicated problems have been calculated by Jelonek, (1953), in an exhaustive discussion of transmission systems.
FILTERING, PREDICTION AND RECOGNITION
Shannons statistical communication theory, of which I have given a very cursory sketch, has been often compared with thermodynamics, because it gives ideal limits for communication systems, as does thermodynamics for heat engines, but it gives little help or advice regarding the best way of approach to the ideal limits. This comparison is perhaps not quite fair, because by pointing out the possibility of transmission without an 5,,2 gFerror in noisy systems, Shannons theory has already proved a powerful driving force. But another objection will be very close to the limit for 2 by the law of remains valid. Shannons theory tells us only how to large numbers. In the limit the noise vector will termiimprove communication by agreement between the nate exactly on a hypersphere with a quite definite sender and the receiver, by coding and by matching. radius. Inside this hypersphere there is complete unBut what can we do if an imperfect agreement has been certainty, but outside there is complete certainty; if we reached, which nevertheless cannot be altered, as for find the signal+noise vector outside this volume the sig- example in the transmission of speech through the telenal cannot be the one corresponding to its center. phone? In this case the helpful advice comes not from Shannons theory, but from a branch of communication theory older than Shannons; from the theory of filtering, powerfully initiated in 1942, independently of one another, by N. Wiener and by A. Kolmogoroff. The first problem is this: we have a message x(t) on which there is superimposed a noise y(t) so we only know z(t) =x(t) +y(t), up to the present instant. We want to separate the message from the noise as well as possible, and also want to predict its future values, as far and as well as possible. The problem is evidently of enormous generality. Everything depends on what operations we have at our disposal and how much we know of the statistical structure of the two time series x and y. The problem is also Fig. S-Signal vector and noise in signal space. When the number of somewhat ill-defined; we have not agreed what criterion dimensions of the signal space goes to infinity, the noise vector to use regarding the optimum solution. It might appear will be located in an ever-narrowing shell on a hypersphere with a radius equal to the rms noise power. natural to use the criterion of minimum equivocation, at any rate for the filtering problem, but this leads into great mathematical difficulties. Wiener and Kolmogoroff If now we have a certain probability distribution of have chosen instead the mean-square criterion, that is signal amplitudes, the signal vector by itself will fill a the mean-square error in the purified signal shall be as certain hypersphere around the origin with a certain small as possible. This criterion is mathematically condensity. Imagine this volume subdivided into uncervenient, it applies equally well to filtering and to predictainty cells which we have just discussed, and the tion, and, as I want to show later on, it retains its adproblem is reduced to the discrete, noiseless case. With vantages far beyond the problem of optimum linear each cell we can associate a distinct transmission symfiltering to which Wiener and Kolmogoroff applied it in bol, which will be identifiable with certainty, and we the first place. need only calculate the entropy from the known probNorbert Wieners elegant solution of the optimum ability distribution. linear filter problem is rightly considered as perhaps the In the special case in which both the signal and the finest application of mathematics to communication noise amplitudes have independent Gaussian distribuproblems, but in this very fact there is an element of tions, this argument leads to the Shannon-Tuller fordiscouragement. If it takes such a brilliant mathematimula for the channel capacity cal effort to obtain the optimum linear filter which, from the practical point of view is a rather modest first step, what hope is there to design the more powerful optimum bits, log2 _ nonlinear filters by analysis? I want to show now that N
l 1 2FT
FT+m ,
C=TF
P-I-N c 1
24
I/R<E
TRANSACTIONS-CIRCUIT a sampling Fig. 6. We sufficiently rn for r(t -
THEORY
December
there is no need to be discouraged, because the optimum linear filter can be obtained, as nearly as does not matter, by a simple algebraical and numerical method which can be extended also to nonlinear filters. The difficulty of the Wiener-Kolmogoroff approach is just that it is analytical, and it is analytical because it operates in the infinite frequency range,. In the finite frequency range, in which all operations are algebraical, the difficulties vanish; also, I regret to say, the elegance. But one need not care too much about elegance in communication theory which, after all, is not only a mathematical work of art but also a handmaiden of engineering. The rigor goes by the board too, but rigor is something which belongs entirely to the realm of thought; it can be achieved only at the cost of a loss of reality, in the present case by introducing the nonphysical infinite waveband. It is the old dilemma between solving an idealized problem exactly or a real problem approximately. In the present case I think all the practical advantages are on the side of the realistic approximation. For simplicity let us operate in the rectangular finite waveband, because this enables us to use the very simple expansion
point, corresponds to n = 0 as illustrated in will not let n run to infinity but only to some large number N. Writing now sn for s(tn) and tJ, the signal behind the filter is s(0) = 5 snrn.
7,=0
For simplicity we have also suppressed the square bracket, as it is understood that we are always talking of cardinal functions, in the finite waveband.
Here we have committed a small error by cutting off the series at the instant t (the present), thus neglecting the effect of the u-pulses which belong to future instants t,. If instead we had expanded s(t) into physical responses, which are exactly zero before the stimulus, we could have suppressed this error but at the cost of using a more complicated rule for the expansion coefficients.2 Let us now put the cardinal signal 1 through a linear filter with the same bandwidth; which has an impulse response r(t). The output of the filter is
p(t)] = J t -00 [s(T>]r(t T)dT
Fig. 6-Algebraic treatment of linear filter. The transformed signal at the present instant is given by s(0) = &,rn, where the S are the signal samples and r is the filter response to a sine signal after n intervals of length 1/2F.
The mean square criterion
now appears in the form
[s(O) - ~(d)]~ = min. where d is an integer, positive for delays, negative for prediction. For uniformity we will write xd instead of x(d). After these explanations the calculation of the optimum filter, characterized by its response r,, is performed in a few lines. We obtain directly
[s(O) Xd]2 = i s&J J t U(T t&o T)dT
-cc
r,(t - t,,) is the response of the filter to a u-pulse or sin? signal centering on t,. *We will now suppress the suffix u, with the understanding that we are talking of this type of response function. As the signal, as well as the response of the filter to it, are completely described by their values at the sampling times t,, we can now exclude all time instants from consideration except the sampling times. It will be convenient to describe these by a suffix n = 2 F(t - tn) where n is a positive integer. The present instant, which is also
2 It is sufficient if the response decays to an insignificant value in a time interval of less than 1/2F before the stimulus, because times shorter than 1/2F have no meaning in the waveband F.
= Eo(~ - 25
XnXm+ -x,y, + -xmy, + -, ynvn)rnrm 4Jss(n- m) ~d(x,f v+, + :d2 = min. (3)
n= #,,(n - d) We have written below the averaged coefficients their usual interpretation as autocorrelation function 4,, of the raw signal s, and its cross-correlation function J/Z8 with the message X. Like all other functions in this treatment, these are now defined for the sampling points only, i.e., for integer values of their arguments. The result is a positive-definite quadratic form in the response samples r,. Minimizing this expression leads to a system of N+l linear equations for the Nfl
1954 values rn:
25
These equations are the algebraical Wiener-Hopf integral equation
equivalent
of the
co r(T)&*(t - T)dT = +As(t - fl).

0
To (4) we cannot apply Wieners ingenious method of solution, for a reason which sharply illuminates the essential difference between the algebraic and the analytical approach. At one step in Wieners derivation use is made of the complex-frequency plane, and of Liouvilles theorem, that a function which has no poles anywhere in the plane must be a constant. But we know only a strip of the frequency plane, and a function which has no poles in a finite strip, however large, need not be a constant, even approximately. On the other hand we have in (4) an ordinary system of linear equations, though possibly a rather large number of them, which can be solved in every concrete case by numerical methods. One might think that this will be a matter of inverting large matrixes, and the problem may have to be referred to giant electronic computers, but this is not the case. Eqs. (4) are not an arbitrary system of linear equations, but have a symmetrical matrix, because they have been derived from a miniand this means that they can be mum condition, tackled by Southwells relaxation method. In this method one starts from some first guess, and modifies the parameters r, one by one, beginning with the one which has the largest effect on the quantity to be minimized, then proceeding to the next, and so on. The surprising efficiency of this method is well known. Even with paper, pencil and a small desk-computer it is possible to tackle 50-100 equations of this type, and comparatively simple automatic computers can be programmed to solve problems of even greater complexity. Once the response samples rn are obtained, network synthesis can be started by well-known methods. This may be very laborious, but it is of interest to note that unless the problem is that of producing a cheap and simple filter, adapted to a special task, there is no need to do it, because a very general realization exists, which does not involve any calculation. This follows directly from the equation for the filter response,
be made positive or negative, this is a general filter which can, within limits, imitate any linear filter.3 Little would be gained, however, if the algebraic method could serve only for deriving optimum linear filters, because it is well known that their discriminating power is not very large, for evident reasons. The only information on the statistical structure of the message and of the noise which is made use of by a linear filter is contained in the autocorrelation function &, and in the cross-correlation function tiZ8. This means, in essence, that we know only a power spectrum and a cross-power spectrum, but this is of course all that a linear filter can deal with. One can also say that the information in +,, and in +%a contains just a rudiment of shafie discrimination, just enough to recognize simple ethos. If there are recurrent symbols in the signal s, consisting of two peaks, the autocorrelation function will indicate them as one peak. As illustrated in Fig. 7, one can also tell whether the peaks had the same sign or not, but one cannot tell in which order they followed on one another. Similarly $Zz, will have a peak if the noise consists of a simple echo. This is evidently very rudimentary information. One understands that linear filters may perhaps eliminate the needle scratch in a gramophone record, but they cannot pick out a cough in a symphony.
1st Ti.,
, Tria,d , I -+
-t Cal -t
\\
0 \ \ 0 \\
\ \ \
YY-L l--L2 (b) --t
--In \ \ 0 1, \\
CC)
Fig. 7-Autocorrelation functions of different order. (a) and (b) Firstorder correlograms. (c) The second-order correlogram +(m, n) completely specifies a triad of peaks. Positive values are shown as dots, negative values as circles.
One way of obtaining better discrimination is the introduction of higher-order correlation functions. Fig. 7 illustrates the way in which recurrent groups of three peaks, triads, are recorded in the second-order autocorrelation function = GqSngSn~ 9L(n1 n2, 921 4.
s(0) = 5 SJ,.
0
One takes a delay line, and taps it at intervals 1/2F. At these points one applies amplifiers with such high entrance impedance that their reaction on the delay line shall be negligible, (for instance by amplifiers whose first element is a cathode follower). The amplified current signals, proportional to r,, and snare added up, (for instance by anode followers). Fitted with variable r,, which can
This is a function of two variables, n =nl -n2 and m =nl--3, defined only for integer values of the argument. It contains a full specification of the triad, even a redundant one, because of its symmetry relative to the axis m =rz. One can conclude that correlation functions of the k-th order are sufficient for the specification of symbols containing it peaks.
3 They are known as transversal filters and have been studied in particular by Kallmann and by Espley.
26
I#RxE
THEORY
December
Consider now an approach in which these higher-order correlation functions are introduced in the simplest possible way.4 It is obvious to extend the transformation equation of the linear filter s(O) = 5 s,m
0
as follows : s(O) = 5 snrn + 5

0 N N N n1
5 sn,sn2rnln
n2
+ E c c snls2sn3rnln2n: + .. . C.5)
There may be, however, a short cut through all these difficulties. Let us take a hint from nature. The lower animals are born with a complete set of instincts, ready for every emergency, while the higher animals, especially man, are born with a more or less blank mind and have to learn. Like many other people, I believe that automatic computers will soon have reached that stage of complication beyond which it does not pay to produce them ready-made, but instead, they must be rnade capable of learning. We have very probably reached this stage with nonlinear filters. Let us therefore construct nonlinear filters with many adjustable pammeters and let the filter adjust itself to the optimum by a learning process.
We have introduced here response functions of higherPrediction Delay -order r,,,, * . . which appear as the coefficients of the products of signal samples, taken at 2, 3 . . . different instants. There is no need to give them distinctive symbols; like the higher-order correlation functions they are sufficiently characterized by the number of their arguments. A filter in which only the coefficients roe, from zero is an instantaneous r000 . - - are different nonlinear transducer. If one now applies the mean-square criterion, one obtains a quadratic expression of the responses of different auto and cross-correlation order, with higher-order Optimum functions as coefficients. The calculation is carried out in Fig. g-Learning filter for noise elimination, Appendix III. Minimizing leads again to a set of linear prediction, or recognition. equations, which can be considered as the algebraic equivalents of certain integral equations for optimum Fig. 8 is a schematic diagram of such a learning filter. nonlinear filters due to L. A. Zadeh (1953). Of course, A long sample of the signal s =x+y is kept in a store, one would even less think of solving these in the linear illustrated as a magnetic tape, and is sent periodically case by inverted matrixes than before. through the filter. At the same time a record of the mesHowever, even the relaxation method is out of its sage x alone is played, advanced if the filter is to be depth at this point. The difficulty is not so much in the trained as a noise-eliminator, with a certain delay, relarge number of equations, though this is large enough. tarded for training as a predictor. The output of the Even with only 10 sampling points one has 10 responses filter and the advanced or delayed message are fed into \ of the first-order, 4.5 of the second-order, 240 of the a Lcomparator, which forms their difference, squares it third-order and so on, with an equal number of equaand integrates it over the whole record. tions. But the real difficulty is that in order to establish The filter has a great number of adjustable elements, these equations one must collect data on the higherwhich can be varied e.g. by turning knobs. These are order correlation functions of the order of the square of worked by the adjustor which is backed by a minithese already large numbers. At this point it looks as if mum computer, which in turn is fed with the meaneven the most gigantic electronic computers would have square error. Let the instructions be such that thi.s mato declare their incompetence. Added to which, even the chine carried out the Southwell relaxation process, as transversal realization of the nonlinear optimum filter follows: the error is first measured for some arbitrary is a formidable task, because it must contain a very starting adjustment. Next, each knob is in turn given great number of elements in which the products of up to two different positions, all the others remaining at their N elements must be formed, and these tend to become original settings, and the errors are measured again. very complicated. These values are stored, and the minimum computer calculates and stores in addition the optimum setting 4 It is difficult to say how much of this is new, as so much writfor every knob, and the corresponding value of the error. ten on this subject appears only in reports. From the abstracts circuAfter all knobs have been tried out in turn, the adjustor lated by the C.C.I.R. it appears that nonlinear filters operating on samples were first considered by H. E. Singleton in 1951, and after finds in the store the number of the knob which has, by him by W. D. White, also 1951, in reports of the General Electric Co., itself, produced the largest decrease of the error, and Schenectady and of the Airborne Instruments Lab., Mmeola, respectively. (Editors note: Dr. Gabor is here referrmg to papers presets this to its precalculated optimum position. The sented by Singleton and White at the 1951 IRE convention. Singlecalculation of the minimum, and of the corresponding tons work was published as Tech. Rpt. No. 160 of Research Laboratory of Electronics, M.I.T. but Whites original paper was never setting is a simple matter if all the adjustable paramepublished. See Proc. NEC, vol. 9, p. 505; February, 1954, for a brief ters figure linearly in the filter output, as in that: case r&urn& W.D.W.).
1954
Cabor: Communication
27
each parameter by itself will produce a parabola, which can be calculated from two points in addition to the original setting. The play is now repeated until a point is reached when further adjustments produce no perceptible decrease in the mean-square error. This process is slow, but it is absolutely certain to converge, except for the danger that the records or some other part of the apparatus may be worn out by the time the learning process is finished. This may be a quite serious danger, because 100 settings in a filter with 100 adjustments make it necessary to play the record 10,000 times, and if the record is an hour long, the valves in the filter may be approaching the end of their life. This suggests very forcibly that it does not pay to make the machine very complicated, in the hope that if it is complicated enough, it will be able to do anything we ask from it. Any prior knowledge of the signal and noise ensemble, suitably embodied in the design of the filter, is likely to pay high dividends. For instance if we want a machine to filter out English speech from a noisy background, it might do it reasonably well for one speaker, from a record only a few minutes long, but if we want it to do this with any speaker, the record will have to be extremely long, and the result is doubtful, as there will be less and less common characteristics on which the machine can rely. It will pay, therefore, to feed the filter instead of speech amplitudes with the invariants of speech signals, discussed in Appendix II. With the reservation that the designer puts a fair amount of prior knowledge into its design, I believe that such learning filters are feasible. They will certainly make the best of their intrinsic ability; how far this will reach depends entirely on the design and on the nature of the statistical material. I am not concerned here with the question whether they will be practical; all I wanted to give is what the mathematicians call an existence proof, but of course without the rigor which mathematicians expect of such proofs. I have mentioned that the filter can be trained for noise elimination (smoothing), or for prediction. It comes, perhaps, somewhat as a surprise that it can be trained also for recognition. Imagine, as an example, that the record is speech, and that the filter is expected to recognize the morpheme pa. In principle we have only to take a long speech record s and produce the message record x from it by blanking out in it everything except the #a-~.~ The machine will now follow the instruction which we have given it repeat only the pa-s, cut out the rest to the best of its ability, and in the end it will learn to give some sort of signal when a pa comes along, and only then. This we can legitimately, though perhaps somewhat metaphorically, express by saying that it has recognized the morpheme. Of course, I do not suggest that this may be a practical way to the solution of the problem of speech recognition, one of the most intriguing and most difficult problems of present-day communication theory. I only wanted to suggest that the path first opened up by Norbert Wiener, consequently pur5 It must be, of course,a morphemewhich doesnot becomeunintelligible if the rest is blanked out.
sued, may lead much farther than one might have thought at first. With the last examples I have stepped over the boundary of communication theory, and have penetrated some way into the domain of cybernetics, which can be defined as the science of purposeful action, based on information. This involves collating data of experience and taking decisions, with the purpose of adjustment to changing circumstances, and control over them. Filtering, prediction and recognition are necessary tools for any such mechanism or organism, but the theory of decisions, and the theory of control are beyond them. I think it will be best to stop at this point. We have seen how a machine may be able to recognize speech ; it would take us too far if we tried to discuss how it would answer back!
APPENDIX THE EXPANSION
I
THEOREM
Let the signal be s(t). A linear filter with impulse response function r(t) transforms this into s(t) =
S -cc
s(7)r(t - T)dT.
(6)
AS the response of any physical filter is zero for negative arguments, we could as well replace the upper limit t by any set limit larger than t. We could also as well replace - 00 in the lower limit by any time earlier than the start of the signal. We will set such limits j--T, and finally go with T to infinity. Replace the signal by its expansion in terms of the responses
[s(t) = c Gdt - h&L 1

-T/2
T/2
where the t, are evenly spaced by intervals l/2 F. The response of the filter now becomes
[s(t)] c c,r(7 - T)dT.(8) =s_::: - t&o ?I

Consider now the error
[s(t) - s(t) 1
In the limit, for very large intervals T we can use the Fourier integral formulas, 00 - T)dT = mS(j)R(j)ez~jftdf; -cc -co
S s(7)r(t S00 co- t,)r(t = s r( - T)dT S

--m OR(f)erif t --m
RJezrjf C+&Jdf. -cc
Hence
[s(t) - s(t) 1
=S
c cnR(j)e-2uiktn - 5if)} df. n
(10)
28
I/R/E
THEORY
December
Let us now calculate the mean quadratic error, in the interval T. In order to make the formulas applicable also to complex signals, we form
cardinal function is not the best approximation by the mean-square criterion. It will be more revealing if we investigate the best approximation in time-language instead of in frequency-language. We go back to (9) and form directly the mean-square error
[b(t) - s(t) 1 I
T/2 where we have used abbreviations for the previous bracket-expressions in the integral (10). We now carry out first the integration with respect to t and obtain a factor
r(t--1)r(t--2) [C C,r(71-tn)-S(T1)]
sss
. [C cmr(72-
-T/2
tm)
-s(72)]dtd71d72.
(13)
This contains certain integrals

co r(t S q)r(71 &)dTl = d2)(t tn).
sin dfl - fdT dfl fJ

which in the limit goes over into a delta function. mean-square error now becomes The
--m
(14) response rr;., of a filters in integral : (15)
1[s(t)]-- s(t)12 ;z ;J-Yz(j)R*(-j) = 01

. 1c
I n
We have termed these r@) because these are the functions, corresponding to the stimulus-time filter with the spectral function R2(f), or two series.6 Using these responses, we also define an (11)
S
c,erinflFR(j)
- S(j) / dj.
I
For stationary signal-series this approaches a finite limit for T-+00, because the energy spectrum S(f)S*@), and the series which approximates it, will be proportional to T in the long run. To simplify the discussion let us now go back to re?l signals, and write the mean-square error
mryt --o)
t,)r@(t
tm)dt
xytn
tm).
This is the scalar product of two iterated filter responses, belonging to the instants t, and t, respectively. With these definitions we can now write (13) in the form
W(t)1 - WI2
Assume now that the spectral transfer function R(f) of the filter vanishes outside + F. One can see immediately that inside this interval we can make the bracket expression vanish, by making the c, equal to the coefficients in the Fourier series expansion of Scf)/RCf) in this interval, that is cn = 2F 1 = lim # T-+m
x2(o)c
cn2+ c
c c,cmx(2)(t, - tm)
x()(T1 72)471 T~)S(TI)S(T~)~T~~T~ .
t&(T2)dT1dT2
S R(j) --F
F -S(f) e-sinflFdj.
dl)(Tl +SW -cc
(16)
(12)
Outside this interval R(j) vanishes, hence there is no difference, behind the filter, between the original signal and its expansion, the generalized cardinal function. This proves the Expansion Theorem. One can also see immediately, that this can be expressed in the form of a generalized Sampling Theorem. Let us replace Scf> in the integral (12) by a function which is identical with S(f) inside, but zero outside the limits i F. We can now replace the limits + F in (12) by f 00, and it is seen, by Fouriers Theorem, that the c, are the sampling values of the cardinal function of the pre-emphasized signal, with spectrum S(j)/RCf) inside and zero outside the frequency interval k F. But by the Sampling Theorem these are also the samples of the preemphasized signal itself, which proves the generalized Sampling Theorem. It is immediately seen from (11) that if Rcf) is not exactly zero outside the limits f F, the generalized
Here we have introduced also the function ~(1) which is the scaler product of the responses r. It is now seen that we can determine the coefficients c,, in the optimum expansion, which makes the mean-square error a minimum, one by one if and only if xc2)(t, - tm) = 0 for n # m. (17)
Thus in the general case not the responses, but the iterated responses must form an orthogonal set. The rectangular filter, whose responses are sin? functions, is a singular exception, because in this case there is no difference between the filter and the iterated filter. If the rc2) form an orthogonal set, the expansion coefficients for best approximation are determined from 6 By the Theorem of Resultants, cf. G. A. Campbell & R. M. Foster, Fourier Integrals for Practical Application, Bell Lab. Series, Van Nostrand, New York, Pair 202; 1948. We have used here throughout the very practical conventions of Campbell & Foster, with the difference that we always use lower case letters, s, r for the time-descriptions, upper case letters S, R for their spectra.
1954 the equations
29
m x@(O)c, - L)dTls(n)x(l)( = c00 r(71 r

71 J -cc J --m
7&h.
(18)
One can express this by saying that it is not tW signal s(t) but another:
co u(t) = c S(T)p(t
J -.m
- 7)dT
(19)
which is expanded by the rule of orthogonal expansion. This is a smoothed signal, because its spectrum is WW(fW( -f)
which goes rapidly to zero outside a certain frequency range in reasonably good filters. In the special case of a rectangular filter the signal is identical with the cardinal function.
APPENDIX WAVEFORMS
II
SIGNS
AND COMMUNICATION
This is necessarily a somewhat cursory discussion of the no mans land between the structural and the statistical theory, which has not yet been properly systematized. The signal space is a useful device for the representation of certain coding processes. An ideal code, for instance, is adequately represented by the association of one uncertainty cell to each signal. Transition from one ideal code to another is the cell-to-cell (not pointto-point), mapping of one signal space on another. Nonideal codes can be also represented to some extent. If Sr is a first signal space with N1 dimensions, in which the messages are ideally encoded, and SZ a second space with Nz> Nr dimensions, any cell in S1 will be represented in 52, in general, by a domain of larger dimensionality. The redundancy in .Szmakes it possible to transmit the signals safely at a higher noise level. A signal point in S1 is characterised by N1 transformation equations xi = Xi(Yl . yN*)
(i = 1 Nr).
There is no objection to delaying a television transmission by one frame, which then can be coded and decoded as a whole. In the case of live signals, such as we have in telephony, radio, etc. the signal space is of less use, because it obscures the elementary fact that there is a certain natural time-order of the samples which are inspace terpreted as coordinates, In a 2FT-dimensional any coordinate is as near to a given coordinate as any other; they can always be turned into one another by a right-angle rotation. For this reason the signal space is of little help in some of the most important actual problems of communication, such as the compression and recognition of speech, and in all others in which time is involved in a nonelemen tary manner. In compression problems, which are by far simpler than recognition problems, the difficulty is that we usually do not know the adequate, nonredundant coding. If we know it, as in the case of man-made codes, such as AM, FM and PCM, the extraction of the adequate data from the redundant signals can be carried out in a straightforward manner. But in the case of speech sounds we do not yet know the adequate representation, though the solution of the problem is within sight.* For the present discussion it is not necessary to go into details, it is sufficient to note that there are several lines of converging evidence to the effect that fully intelligible speech of reasonably natural quality can be transmitted in a waveband of not more than 150 cys. Accepting this, the problem of their representation could be attacked in the following way: It is one of the best established results of subjective acoustics, that time discrimination of speech sounds by the human ear starts at time intervals of about 25 milliseconds. Signals shorter than this can be reversed without producing any appreciable difference. One can say that the ear applies a short-term Fourier-analysis to the wave amplitudes, and the data inside this interval are experienced as simultaneous. From the other experiments, just mentioned, one must conclude that the
s Our present knowledge on the adequate representation of speech sounds is the outcome of just over a hundred years of research, on two main lines. One line set out to answer the question what features can the ear distinguish in speech sounds?; the other what speech sounds can the human vocal organs produce? Milestones in the first line of approach were the discoveries by Ohm and by Helmholtz regarding the insensitivity of the ear to phases, the recognition by Harvey Fletcher and by Homer Dudley that the intelligence was spectral envelope, and the discontained in the instantaneous covery by J. C. R. Licklider and I. Pollack of the surprising degree of intelligibility after infinite clipping has been applied to speech. The other line of approach, from the side of the vocal organs, has also started just about a hundred years ago. Alexander Melville Bell was the first to show that only 4-6 slowly varying parameters were involved in the production of speech sounds by the vocal organs. Successful speech-producing mechanisms were devised by Homer Dudley, Schott, Dunn, Cooper, and most recently by Walter Lawrence, who showed convincingly that fully intelligible speech of rather good quality could be synthetized out of 6 slowly varying parameters, each with a frequency band of 25 cys only. These are: the positions of the three principal formants, the intensity of a white noise source, and the intensity and frequency of a larynx-like source, with many harmonics. The total frequency band required is 150 cys. For the time being only synthesis has succeeded, not transmission. The most economical transmission system is still H. Dudleys vocoder, in which 10 channels of 25 cys are used to transmit 10 segments of the spectral envelope, and one more for the pitch of the artificial larynx, making 275 cys in all.
Signal points in Sz stand for the same communication sign so long as the functions Xi have certain values. This means that the yk can suffer certain transformations, by faulty devices, or by noise, which do not affect the transmission at all. One can therefore consider the Xi as the invariants of the nondestructive transformations. The study of optimum redundant codes has led to some interesting results, such as Hammings self-checking codes, which may produce an enormous reduction in error at the cost of a small increase of dimensionality. The signal-space concept remains useful, however, only if the messages are coded and decoded in certain agreed lengths, which are preferably though not necessarily constant. One can call these dead signals.7 Telegrams are an obvious example and also television.
7 D. Gabor, A Summary of Communication Theory, Communication Theory, Proc. of the London Symposium 19.52, Willis Jackson, Ed., Butter-worth, London, p. 1; 1953.
30
I*R#E
THEORY
December
number of these data is not more than four. Hence the acoustical sensations, at least as regards speech, may be best represented by stringing together as a time-sequence instantaneous signal spaces, each with 4 coordinates. These four-dimensional spaces can be mathematically represented in an infinity of ways. In order to find the best representation, it appears natural to try and establish a metric in them, similar to what Erwin Schrodinger has proposed for the three-dimensional representation of our light and color sensations: a unit of the metric, in any direction, shall be the distance between two just distinguishable sensations A and the limen. By definition this must satisfy the two general conditions for a metric; &4, B)=p(B, A), p(A, B)+p(B, C) zp((A, C). Then it might or might not be possible to go over to a Euclidean metric by a suitable transformation; this will not be possible in general, but if such a Euclidean representation exists, it will be a preferred representation, with certain special advantages. Unit steps along any coordinate then represent just perceptible differences. But in any case it will be possible to quantize the signal by defining a lattice, such that from any lattice point there shall be a path to any other with points spaced at not more than the limen. (In a threedimensional Euclidean space this would be the diamond structure.) Once the instantaneous space is quantized, it is preferable to express each lattice point xl, x2, ~3, x4 by a single number. This is then represented by an amplitude sample, of which there are 80 per sec., corresponding to a waveband of 40 cys. (Each tetrad corresponds to an interval of 25 milliseconds, hence one might think that 40 samples and a waveband of 20 cys is sufficient, but the experimental results so far obtained do not encourage such an over optimistic assumption. 80 samples mean that each instant belongs to two halfoverlapping intervals of 25 ms each.) There are an infinite number of ways in which a set of numbers x1 . . x4, which may be integers, can be represented by a single number. For simplicity, assume each xi can assume n values 0 . . . n - 1. We can then write the representative number as an n-nary fraction,
etc. The matrixes are orthogonal. In this representation an error in any one of the digits affects the parameters by the same absolute amounts. But the representation is not complete; 3/4 of the combinations x1 . . . x4 give noninteger z-s while all integer z-s give integer x-s. The best compromise is to make
Zl = 3(-x1 + x2 + $3 + x4) Xl = 3(-21+ zz + z3 + z4)
B;
This is an economical representation, because if for example n =4, the number of combinations is 2.56 and the representative number assumes 256 contiguous values. But it is a highly asymmetrical representation, because an error up to l/64 might affect only x4, which is therefore most exposed to errors. It may be preferable to make the representative number
.zlz2z324
etc. As only integer z-s and x-s are allowed there will be an uncertainty of ++ for every second valu.e, but this, by assumption, is permissible. After these preparations (some of which will involve much painstaking work) the much more difficult: problem of speech recognition can be attacked with a good chance of success. By stringing together the representative numbers in a time sequence we have obtained a representation with a very greatly reduced num.ber of data. For two reasons we have still not got rid of all redundancy. One is the common and obvious psychological experience, that if we present to an observer two independent patterns together, each with, say, 100 distinguishable configurations, the resulting number of distinguishable combinations will not be 10,000, but much less. The other is that, of course, even if we had only 10 values for the representative amplitudes, of which there are 80 per sec., lOso is still enormously larger than the number of all distinguishable speech patterns in a second. There is a gap here between the efficiencies of telephony and of telegraphy which it is not quite possible to bridge. On the other hand, even with the enormously reduced number of data on which the recognizer can go to work at this stage, recognition is not yet the clear-cut business which it is in automatic telegraphy. It is well known that primary recognition, that is to say recognition of a speech sound by itself applies only to a fraction of the speech sounds. Moreover, as the vowels belong to this class, this is the less important fraction. The rest have to be guessed by transitions and by interpolation from the recognized parts, with the help of previous knowledge of the statistical and grammatical structure of the language used, by personal knowledge of the speaker, of the subject on which he is talking, etc. Some of this goes far beyond what we can expect from any machine at the present stage, and it is rather uncertain whether the mechanical typist, (even with phonetic spelling), will ever become a practical possibility. I believe that the nonlinear filters, discussed in the text and in the next Appendix, especially the learning type, operating on the basic acoustical parameters, offer a reasonably simple and promising way for dealing at least with the simpler statistical connections in any language.
APPENDIX
I II
FILTERS
with the following the parameters x:

Zl 82 = = $(-x1 9(+x1 + x2 x2 + +
equations
x3 x3 + + 24) x4)
between the digits Z and

x1 x2 = = + Zl Zl + z2 + z2 + z3 + z3 + z4 z4
NONLINEAR
OPTIMUM
The calculation which leads from (5) to the equa.tions for the nonlinear responses is as follows. For simplicity only the first two terms in the expansion will be taken into account.
n----
----r.--
31
xd&yn
Xd&$nZ~n,nZ
xd2.
The averages over the signal samples are the higherorder autocorrelation functions, for instance &p7L*&L,= dsa(n1n2, n1 123)
with any permutation of nl, x2, n3. There occur also cross-correlation functions of s-x+y with the message x of the type
xdsn&
= Jl&l
- a, n2 - d).
Minimizing the mean-square error with respect to the first-order responses r, gives the equations
and minimizing sponses
with
respect to the second order re-
=l)zs(n-d,
m-d).
As explained in the text, there is no particular interest in solving these equations not even with automatic computers, because the learning filter collects the statisti-
cal material and solves the equations in one operation. Moreover, there is no particular interest in the filterresponse function 5. It is rather difficult to multiply a great number of values, but it is comparatively easy to add them up with coefficients, t- 1, and to raise the sums to the nth power. The set of these sums raised to the nth power is a complete representation of the nth order products. All that matters for the easy achievement of the relaxation process is that the variable parameters shall occur linearly in the error. An objection can be raised against all nonlinear filters, and this is just that being highly nonlinear, they cannot be expected to operate optimally at all levels. This applies even to the linear filter; it operates optimally only at one signal: noise ratio. If one trains it at different noise levels, at best it can make a reasonable compromise. In the general case the solution is an even more complicated filter, which has been trained at different levels, has stored away its optimum settings, and supplies the appropriate one in every case. It appears though that in the case of speech-recognizing filters it is not necessary to go to such lengths. SDeech can be recognized at all levels, over a volume range of at least 60 decibels. The reason for this was revealed by Licklider, when he showed that even infinitely clipped speech is intelligible. In other words a speech recognizer can be fed with enormously compressed amplitudes, at practically one level only. For the signal samples, as well as for the correlation functions, there will then be only three values, 0 and i 1. A word may be said about correlation functions as recognizing tools. They have one important qualification for this job, by the fact that they are independent of where the signal occurs, but one might object that they lose their power if the signal is proportionally stretched or compressed in time. This, however, is true only of the first-order correlation function; a look at Fig. 7 will show that the second-order function keeps also a record of the ratios.

Communication Theory and Cybernetics1: (S (T) ) 2 S (t&4t

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Communication Theory and Cybernetics1: (S (T) ) 2 S (t&4t

Uploaded by

Copyright:

Available Formats

1954

Theory and Cybernetics1

S --03 =csnsn+ S= c c snsm+ n m

1 u,u,+dt = 2F c snsn+-. --m

Theory and Cybernetics

22 1 x log2 M = i log2 l/p = -

P(j) IE J-1. = PC6

that the signal j received =-. P(t j)

Theory and Cybernetics

TRANSACTIONS-CIRCUIT a sampling Fig. 6. We sufficiently rn for r(t -

The mean square criterion

now appears in the form

[s(O) Xd]2 = i s&J J t U(T t&o T)dT

1954 values rn:

Theory and Cybernetics

These equations are the algebraical Wiener-Hopf integral equation

co r(T)&*(t - T)dT = +As(t - fl).

YY-L l--L2 (b) --t

as follows : s(O) = 5 snrn + 5

Theory and Cybernetics

[s(t) = c Gdt - h&L 1

[s(t)] c c,r(7 - T)dT.(8) =s_::: - t&o ?I

S s(7)r(t S00 co- t,)r(t = s r( - T)dT S

RJezrjf C+&Jdf. -cc

c cnR(j)e-2uiktn - 5if)} df. n

This contains certain integrals

sin dfl - fdT dfl fJ

(14) response rr;., of a filters in integral : (15)

1[s(t)]-- s(t)12 ;z ;J-Yz(j)R*(-j) = 01

x()(T1 72)471 T~)S(TI)S(T~)~T~~T~ .

dl)(Tl +SW -cc

1954 the equations

Theory and Cybernetics

m x@(O)c, - L)dTls(n)x(l)( = c00 r(71 r

with the following the parameters x:

between the digits Z and

Theory and Cybernetics

and minimizing sponses

respect to the second order re-

You might also like