You are on page 1of 6

1

Automatic recognition of multiple targets with varying velocities using quadratic correlation lters and Kalman lters
Andres Rodriguez, Jeffrey Panza, B.V.K. Vijaya Kumar Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA Abhijit Mahalanobis Missiles and Fire Control Lockheed Martin Orlando, FL

AbstractAutomatic target recognition (ATR) systems require detection, recognition, and tracking algorithms. The classical approach is to treat these three stages separately. In this paper, we investigate a correlation lter (CF)-based approach that combines these tasks for enhanced ATR. We present a Kalman lter framework to combine information from successive correlation outputs in a probabilistic way. Our contribution is a framework that is able to locate multiple targets with different velocities at unknown positions providing enhanced ATR with only a marginal increase in computation over other CF ATR algorithms.

I. I NTRODUCTION Automatic target recognition (ATR) is a critical aspect of numerous applications including guided weapons, autonomous surveillance, and unmanned vehicles. ATR is composed of detecting, classifying, and sometimes tracking targets. The classical ATR approach used in video sequences is to rst pass the current frame through a detection algorithm to identify potential target locations. The identities of the targets at these locations are then determined by a classication algorithm. The locations of these targets are used by trackers, which associate current target locations with previous ones. It has been shown that performance improves by approaching these tasks in a more integrated manner [6]. Correlation lters (CFs) have been successfully applied to a variety of pattern recognition applications [12]. CFs can be designed to yield correlation peaks for each target of interest in the scene while exhibiting low response to clutter and background. Attractive properties such as shift-invariance, noise robustness, graceful degradation, and efcient implementation make CFs well-suited for ATR applications. The simplest CF is the matched lter (MF) with output g (m, n) = f (m, n) h(m, n), (1)

where represents the discrete 2-D correlation operation, and h(m, n) is the 2-D impulse response of the MF applied to image frame f (m, n). This can be computed efciently in the Fourier domain as g (m, n) = F 1 (F (f (m, n))(F (h(m, n)))). (2)

The MF, however, performs poorly with distorted targets and does not produce sharp peaks.
978-1-4244-5812-7/10/$26.00 2010 IEEE

During the past three decades advances in CFs have made them well-suited for ATR applications. The Equal Correlation Peak Synthetic Discriminant Function (ECP-SDF) [5], [3] builds a single CF from a linear combination of training images and constrains the CF response to the training images. By incorporating multiple training images the resulting CF is more tolerant to the distortions represented in these training images. The Minimum Average Correlation Energy (MACE) lter [10] produces sharp peaks by minimizing the correlation output energy thereby reducing the sidelobes and generally improving discrimination performance. In this work we use the Quadratic CF (QCF) [9] designed to produce a large separation between two classes. The peaks in the correlation output in response to a video frame are, traditionally, compared to a threshold to produce a hard decision. This method, however, ignores the positional continuity of a moving target across frames. The recently introduced Multi-Frame Correlation Filter (MFCF) [6] combines information from the current correlation output with previous ones to enhance detection. The MFCF is based on a Bayesian framework where the correlation output is mapped into a probability plane, representing the probability that a target is located at the corresponding pixel position. A xed 2-D Gaussian, which models the motion of the targets between frames, is then convolved with this probability plane resulting in an estimate plane, that is, the probability of the target at any location in the next frame. When the next frame is processed, the estimate plane is multiplied by the newly mapped probability plane. As successive input frames become available, the probability of a target given the previous correlation planes is updated to reect the probability based on both the estimate planes and the current correlation plane. This approach greatly improves the performance of conventional single-frame correlation ltering by suppressing false alarms and improving the ability to detect targets in noise and background clutter. However, the MFCF motion model is xed and does not take into account the targets velocity. It is inadequate to represent the case where there may be multiple targets with different velocities and/or a target exhibiting varying velocity. For example if the targets displacement is large, the convolution of the probability plane with a 2-D Gaussian will place the
446

targets location under a low probability region in the estimate plane, and the target may not be detected. In addition the MFCF is not very robust to deal with target occlusions. When a target is occluded, the correlation peak becomes small leading to a low value in the probability plane. In this paper we present a framework that overcomes these limitations by using Kalman lters (KFs) to predict the locations of the targets in the next frame. Although there are more advanced trackers (such as the feature-aided tracking (FAT) [1]), the purpose of this work is to show that combining a CF with a tracker improves detection, and therefore a simple tracker is sufcient for our purpose. The KF is designed to give the minimum mean squared error (MMSE) estimation of a moving targets state [4]. In this work correlation peaks are the observations to the KFs. Our contribution is a more robust framework that enhances detection. Our algorithm, referred to as Kalman Correlation Filter (KCF), is robust to variations in velocities and temporary occlusions under low SNR conditions. The rest of this paper is organized as follows. Section II discusses the QCF and the KCF algorithm. Section III describes our simulation setup. The results are analyzed in Section IV, and we provide concluding remarks in Section V. II. T HEORETICAL F RAMEWORK A. Quadratic Correlation Filters The QCF is designed to produce a large separation between two classes. It retains the main advantages of linear CFs and can be efciently implemented as multiple linear CFs working in parallel to produce a single combined output. To design an QCF, let x be a vectorized image and let T be a coefcient matrix. We want xT Tx to be large positive when x is a true-class target image and large negative when it is a false-class target image. Let c = Ec {xT Tx} indicate the mean QCF output for all training images belonging to class c where c {true, false}. We seek a T such that J = |true false | (3)

where F = [f1 fp ], B = [b1 bn ], v = FT z, and w = BT z. The value of vi (the ith element of v) can be obtained for all locations within the image by means of 2-D correlation: vi (m, n) = x(m, n) fi (m, n), (7)

where fi (m, n) is the r c reshaped fi eigenvector. Similarly, the ith element of w is wi = x(m, n) bi (m, n). The QCF output for all points is g (m, n) =
i

|vi (m, n)|2


i 2

|wi (m, n)|2

=
i

|x(m, n) fi (m, n)|


i

|x(m, n) bi (m, n)|2 , (8)

which can be efciently implemented using Fast Fourier Transforms. The correlation output g (m, n) will exhibit peaks at the locations where targets of interest are located. Correlation peaks are often measured in terms of their relative height compared to surrounding values. A commonly used correlation peak metric is the peak-to-sidelobe ratio (PSR) dened as peak , (9) PSR where and are the mean and the standard deviation, respectively, of a p p region surrounding the correlation peak excluding a small area around the peak. It can be shown that computing the PSR for all values of the correlation output requires four more FFTs [7]. To avoid extra notation, we will simply use g (m, n) without distinguishing between correlation output and PSR as our approach is applicable in both situations. The correlation peak locations and heights are used in the KF observation model. B. Discrete White Noise Acceleration Kalman lter The Kalman Filter (KF) approach allows recursive minimum mean squared error (MMSE) estimation of a moving targets state. The KF consists of two models. The state model describes how the system evolves in time and the observation model describes how the observations are related to the states: xt zt = Axt1 + t (State model) = Cxt + t (Observation model) (10) (11)

is a large value. In other words we want a large between-class separation. Assuming T to be of the form
p n

T=
i=1

fi fiT
i=1

bi bT i ,

(4)

it can be shown [9] that J (T) is maximized for a given p and n when f1 fp , b1 , bn are the eigenvectors corresponding to the p largest positive and n largest negative eigenvalues of (R1 R2 ), (5) where Rc = Ec {xxT }. To apply the lter to a test image x(m, n), rst let z be a vectorized subregion (equal in size to a training image) of x(m, n) with QCF output y = zT Tz, i.e., y = zT (
i

We are interested in using a discrete white noise acceleration model [2] to allow for velocity changes. The state vector is (KFs assume a multi-variate Gaussian distributed state) xt = [px , py , vx , vy ]T , (12)

fi fiT )z zT (
i

bi bT i )z

= zT FFT z zT BBT z = vT v wT w,
978-1-4244-5812-7/10/$26.00 2010 IEEE

(6)

where px and py represent the pixel location, and vx and vy represent the target velocities in pixels per frame in the x and y directions, respectively. Note that the state represents the target in the image plane and not in the world coordinates. The state transition matrix is 1 0 T 0 0 1 0 T A= (13) 0 0 1 0 , 0 0 0 1
447

where T (set to 1 without loss of generality) represents the time between the measurements. The random variable vector t is modeled as 1 2 1 2 T (14) t = [x T , y T , x T, y T ] , 2 2 where x and y are zero-mean Gaussian random variables 2 2 with variance a = E {2 x } = E {y } being a constant chosen to represent the noise level. It is suggested [2] that the standard deviation a be of the order of the maximum acceleration magnitude aM . A practical range is 0.5aM a aM . This has the effect of introducing zero-mean noise propagate into the velocities and positions. Expanding the state model equations: xt = Axt1 + t 2 x 1 pxt1 1 0 T 0 pxt 2T 1 2 pyt = 0 1 0 T pyt1 + y 2 T 0 0 1 0 vxt1 x T vxt vyt1 0 0 0 1 vyt y T 1 2 pxt1 + T vxt1 + x 2 T pyt1 + T vyt1 + y 1 T 2 2 . = (15) vxt1 + x T vyt1 + y T The various noise terms are correlated and the resulting noise terms have the following covariance matrix (we assume that noise terms in the x and y directions are independent) Rt = E{ t T } 1 t 4 T 4 2 0 = a 1 3 2T 0 (16) 0 1 4 T 4 0 1 3 T 2
1 3 2T

Figure 1. Relationship between the max PSR gmax and the uncertainty in 2 position measurements p

model (somewhat arbitrarily, many other such models might 2 and gmax (shown in Figure work) the relationship between p 1) as 2 p = 2.5(kgmax ) + 0.001 (22) where gmax is the highest correlation output observed and k is a constant chosen as the average gmax for the visible target case. We add a small value of 0.001 to keep at least a very small degree of uncertainty even for high correlation outputs. C. Combining Kalman Correlation Filters

0 T2 0

0 1 3 2T . 0 T2

(17)

We will use the location of the maximum peak in g (m, n) as the observed position of the target in the KF model, i.e., zt = [ox , oy ]T , (18)

where gmax = (ox , oy ) is the peak location. Since we only observe the position, our observation matrix is C= 1 0 0 1 0 0 0 . 0 (19)

Expaning the observation model equations: zt oxt oyt = Cxt + t = 1 0 0 1 0 0 pxt 0 pyt + xt . 0 vxt yt vyt (20)

The Gaussian-distributed random vector t in our update equation is assumed to be zero-mean with covariance matrix Q = E { t T t }=
2 p 0

0 2 . p

(21)

The KCF allows recursive MMSE estimation of a dynamic targets random vector state represented by a mean vector and a covariance matrix. Algorithm 1 contains the basic pseudocode for the KCF algorithm for one target. We extend our work to multiple targets in Section II-D. The ve equations shown in the algorithm are known as the KF equations. The Kalman Gain matrix K weighs condence in the prediction versus the observation. Complete condence in the prediction corresponds to a K with zero values. In our experiments complete condence in our measurements corresponds to a K with ones along the diagonal. This KCF algorithm provides a framework which suppresses noise by combining information from current observations with information from all previous correlation outputs. It optimally (in the MMSE sense) computes the probability of a target before and after observing each correlation output. It is robust to variations in velocities and acceleration and can accommodate target occlusions. For example if the target is moving at a rapid velocity and disappears for a few frames, the KCF can still detect it: it would place low condence in 2 the observations (since a low gmax yields a very high p ), and more condence in the prediction, giving an estimated position of the target. D. Multitarget ATR Algorithm 1, however, does not use the full information of the correlation output and therefore only works for one target. In order to take full advantage of the properties of CFs (i.e.,
448

2 We relate p to the peak height of the correlation output gmax . A higher peak represents more condence in target location estimates and therefore a small variance; a smaller peak represents lower condence and therefore a large variance. We

978-1-4244-5812-7/10/$26.00 2010 IEEE

Algorithm 1 Single-Target KCF 1: init params: A, R, C 2: while get another image = NULL do 3: apply QCF and compute PSR g 4: observe z = [(ox , oy )]T as location of gmax 2 5: compute measurement error p and Q , Q, C) , 6: if not rst image then , =UPDATE( 2 2 2 2 7: else initialize = [z, 0] and = diag(p , p , v , v ) , =PREDICTION(, , A, R) 8: 9: end while , Q, C) , 1: function , =UPDATE( T T 1 2: K = C (CC + Q) + K(z C ) 3: = KC 4: = =PREDICTION(, , A, R) , 1: function 2: = A = AAT + R 3:

locate multiple targets at unknown locations), we make a few modications to Algorithm 1. Algorithm 2 contains the basic pseudocode for the KCF algorithm for multiple targets. To locate more targets in the rst image, we nd the highest correlation peak, set to zero a small area around it, nd the next highest peak, and repeat this until all peaks above some threshold are located. We have to set to zero an area around the peaks to prevent confusing one peak that extends over a few pixels with multiple targets. For each peak, we assign a state xi with that position and with zero velocity. The uncertainty of the position depends on the peak height, and the uncertainty 2 in Algorithm 2) for in velocity is chosen to be a constant (v the initial states. We then apply the prediction equations in Algorithm 1 to each state. For subsequent images, we nd the highest peak in an area near a state (we start with the state with the smallest uncertainty in position). The size of the area is proportional to the states position uncertainty. We assign that peak to that state and set to zero a small area around that peak. We repeat this for all the states in the order of increasing position uncertainty. We then apply the update equations in Algorithm 1 to each state. We nd new targets by repeating the process we did for the initial image using the remaining peaks in the correlation output. Finally we apply the update equations in Algorithm 1 to each state. If a states location uncertainty grows above some threshold (the standard deviation in position is larger than the image), then we delete such target because either the target was lost or it never existed. A state xi is declared to correspond to a target if the uncertainty of the states position after the update equations is smaller than some threshold. E. Computational Requirements We measure computation in terms of the number of 2-D FFTs required to apply the QCF to a given frame in the video. If n is the total number of eigenvectors used in the QCF (we usually use 6 n 16), then n + 1 2-D FFTs are required to apply the QCF. In addition, four 2-D FFTs are required to
978-1-4244-5812-7/10/$26.00 2010 IEEE

Algorithm 2 Multi-Target KCF 1: init params: A, R, C 2: while get another image = NULL do 3: if not rst image then 4: for each state xi do (from smallest to largest position uncertainty) 5: observe z = [(ox , oy )]T as loc of gmax in area y,y xi x,xi , yi 6: zero a small area in g around and including z 2 7: comp measurement error p and Q i , i , Q, C) 8: i , i =UPDATE( 9: end for 10: end if 11: while gmax > threshold T do /* nd new states */ 12: observe z = [(ox , oy )] as loc of gmax 13: zero a small area in g around and including z 2 14: comp measurement error p and Q 2 2 2 2 , v ) 15: init n = [z, 0] and n = diag(p , p , v 16: end while i =PREDICTION(i , i , A, R) i, 17: 18: for all states xi : if xi > size of img -> delete state 19: detection: for all states xi : if xi < threshold T then label state as true target 20: end while

compute the PSR. The total number of 2-D FFTs rqeuired to apply the lter to each frame is n + 5. In addition to these computations, the MFCF requires a convolution of the posterior with a Gaussian. This requires an additional 2 FFTs. In contrast, the KCF does not require any additional FFTs. The bottleneck of the KF portion of the KCF are the 4 4 matrix multiplies which are negligible compared to the MFCFs FFT computations. III. E XPERIMENTAL S ETUP We synthesized videos of moving tanks from three different sets. We refer to these three sets as Abrams, German, and Missile as shown in Figure 2. We adapted a system developed by Kerekes [7] to create these videos. Images at multiple depression angles and various azimuths (covering all 360 degrees of rotations) are taken. The tanks are segmented from the images and projected onto a plane using projective geometry to simulate a 3-D world. A simulated camera is used to capture 2-D images of the 3-D world. The collection of captured images forms the video. Our main modication to the system is having the tanks move at different accelerations. In these experiments, the lter is trained with German (true class) and Abrams (false class) images at a depression angle of 17 degrees and tested with video made from images at a depression angle of 21 degrees. Hence none of the training images are used in testing. We show the results for three synthetic video sequences. We label the sequences StraightGAM, CirclerectangleGA, CirclerectangleGG, and CirclerectangleGM, where G, M, and A at the end of the name represent videos containing tanks German, Abrams, and/or Missile, respectively. (All the videos are at
449

Figure 2. Images of toy Tanks Abram (left), German (center), and Missile (right) are used to test the algorithm

Figure 4. Frames 2, 7, 11, and 15 of CirclerectangleGA video. Top: before noise is added (for ease of visibility for the reader). Bottom: after the background mean is increased and AWGN is added (the actual frames used in testing).

Figure 3. Frames 3, 4, 5, and 6 of StraightGAM video. Top: before noise is added (for ease of visibility for the reader). Bottom: after the background mean is increased and AWGN is added (the actual frames used in testing).

the authors website [11].) To each sequence we added white Gaussian noise to yield a desired SNR. All sequences have a background intensity mean of 69.3 (the range is [1 256]) and add white Gaussian noise corresponding to 7 dB SNR. The SNR value is measured as the ratio of the total energy of the zero-mean image (subtract the mean of the image and then take the sum of the square magnitudes of those intensity values) before noise is added over the total energy of the added noise. The rst synthesized video, StraightGAM, contains six tanks: two from each class. Three tanks are stationary and three other tanks move from left to right in the image and are occluded as they pass behind the stationary tanks. The moving tanks each have a different positive acceleration. Figure 3 shows some of the video frames before noise is added (for ease of visibility for the reader), and after the background mean is increased and AWGN is added (the actual frames used in testing). To help identify the tanks in all the following gures we labeled them with the letters A, G, and M for tanks Abrams, German, and Missile respectively. The second synthesized video, CirclerectangleGA, contains tanks German and Abraham crossing each other, turning when they reach the end of the screen, and repeating the processes completing one loop. The tanks go underneath a bridge and temporarily disappear from the cameras view. Figure 4 shows some of the video frames. The third synthesized video, CirclerectangleGG, contains two German tanks crossing each other, turning when they reach the end of the screen, and repeating the processes completing one loop. The tanks go underneath a bridge and temporarily disappear from the cameras view. Figure 5 shows some of the video frames. The fourth synthesized video, CirclerectangleGM, contains tanks German and Missile tanks (note that the lter is not trained with Missile images) crossing each other, turning when they reach the end of the screen, and repeating the processes completing one loop. The tanks go underneath a bridge and temporarily disappear from the cameras view. Figure 6
978-1-4244-5812-7/10/$26.00 2010 IEEE

Figure 5. Frames 2, 7, 11, and 15 of CirclerectangleGG video. Top: before noise is added (for ease of visibility for the reader). Bottom: after the background mean is increased and AWGN is added (the actual frames used in testing).

shows some of the video frames. IV. R ESULTS The results show the potential improvement offered by the KCF algorithm over the MFCF algorithm. Since Kerekes [8] already showed that the MFCF improves detection over the single frame detection approach, an improvement over the MFCF method is by default better than the common single frame detection. In basic experiments where the targets acceleration doesnt vary, the target is not occluded, and the velocity is small, both the MFCF and KCF algorithms usually exhibit perfect receiver operation characteristics (ROC) curves. On the other hand, the KCF gives better results in cases when the target is occluded and when acceleration varies, as our results will show.

Figure 6. Frames 2, 7, 11, and 15 of CirclerectangleGM video. Top: before noise is added (for ease of visibility for the reader). Bottom: after the background mean is increased and AWGN is added (the actual frames used in testing).

450

V. C ONCLUSIONS In this paper we presented a framework for ATR that combines information from multiple correlation outputs in a probabilistic way. We used a KF which allows us to obtain a MMSE estimation of our target combined with the benets of CFs. Our framework keeps the shift-invariance benet of CFs and therefore is able to locate targets at unknown positions. The correlation outputs are used as observations to the KF and an optimal (in a MMSE sense) estimation of the targets location is obtained. This leads to enhanced target detection with only a marginal increase in computation over single frame detection and improved computation over the typical MFCF. R EFERENCES
[1] C. Agate and K. Sullivan. Signature-aided tracking using association hypotheses. In SPIE Proceedings, volume 4729, pages 4455, 2002. [2] Y. Bar-Shalom, X.R. Li, and T. Kirubarajan. Estimation with applications to tracking and navigation. Wiley, New York, 2001. [3] D. Casasent. Unied synthetic discriminant function computational formulation. Applied Optics, 23(10):16201627, 1984. [4] M. H. Hayes. Statistical Digital Signal Processing and Modeling. Wiley, New York, 2006. [5] C. F. Hester and D. Casasent. Multivariant technique for multiclass pattern recognition. Applied Optics, 19(11):17581761, 1980. [6] R. A. Kerekes, B. Narayanaswamy, M. Beattie, B. V. K. Vijaya Kumar, and M. Savvides. Multiframe distortion tolerant correlation ltering for video sequences. In Proc. of SPIE, volume 6245, pages 920, Apr. 2006. [7] R. A. Kerekes and B. V. K. Vijaya Kumar. Selecting a composite correlation lter design: a survey and comparative study. Optical Engineering, 47(6):067202:118, June 2008. [8] R.A. Kerekes and B. V. K. Vijaya Kumar. Enhanced video-based target detection using multi-frame correlation ltering. IEEE Trans. on Aerospace and Electronics Systems, 45(1):289307, 2009. [9] A. Mahalanobis, R. Muise, and S. R. Stanll. Quadratic correlation lter design methodology for target detection and surveillance applications. Applied Optics, 43(27):51985205, 2004. [10] A. Mahalanobis, B. V. K. Vijaya Kumar, and D. Casasent. Minimum average correlation energy lters. Applied Optics, 26(17):36333640, 1987. [11] A. Rodriguez. www.ece.cmu.edu/afrodrig. [12] B. V. K. Vijaya Kumar, A. Mahalanobis, and R.D. Juday. Correlation Pattern Recognition. Cambridge Press, New York, 2005.

Figure 7. Example output from one frame: (left) the frame, (center) the probability image after an observation, and (right) the probability image after applying the motion model but prior to the next observation.

Figure 8. KCF and MFCF ROC curves for the videos (top left) StraightGAM, (top right) CirclerectangleGA, (bottom left) CirclerectangleGG, (bottom right) CirclerectangleGM.

Figure 7 shows the outputs of Frame 5 in video StraightGAM. It contains the image frame, the probability image after an observation (update), and the probability image after applying the motion model but prior to the next observation (prediction). The update plot correctly shows two Gaussian distributions centered at the correct locations of the true target and correctly ignores the four other targets. The prediction plot predicts the next location of the true targets. Note that the stationary target has the Gaussian centered at the same location while the moving target has the Gaussian shifted towards the location it predicts will be the next location of that moving target. The ROC curves are shown as the average of false alarms per frame vs. the detection rate. ROC curves for the KCF and MFCF algorithms for each test video are shown in Figure 8. In the video StraightGAM, the KCF ROC curve shows an improvement over the MFCF algorithm because of its ability to predict the targets location even when the targets displacement is large. In the videos CirclerectangleGG, CirclerectangleGA, and CirclerectangleGM the KCF ROC curves show an improvement over MFCF algorithm because of its ability to predict the targets location even when the target is temporarily occluded. In practice we would probably use a sensor with higher SNR resulting in even better performance.
978-1-4244-5812-7/10/$26.00 2010 IEEE

451

You might also like