Professional Documents
Culture Documents
Attention Detection
Authorized licensed use limited to: Nanyang Technological University. Downloaded on April 25, 2009 at 09:59 from IEEE Xplore. Restrictions apply.
1 X receptive field inhibition for contour detection [2], we ar-
|SDMs,k (i + u, j + v) − SDMs,k (i, j)| (2) gue that local context information is essential to discrimi-
N u,v
nate between true and spurious ARs. The influence of con-
with N being the number of patches in the neighborhood. text can be brought about by suppressing the saliency of
A measure for texture contrast at a patch (i, j) and at any a region whose neighborhood has similar contrast, through
scale s and orientation k is calculated as the process of surround suppression. In Figure 2(a), for ex-
T Cs,k (i, j) = AM Ds,k (i, j) × ASDDs,k (i, j) (3) ample, the ARs caused by the background bushes and the
while the final texture contrast at patch (i, j) is obtained as grass need to be suppressed to capture the true AR of the
XX antelopes as perceived by the human visual system. Here,
T C(i, j) = T Cs,k (i, j) (4) we propose a local context suppression strategy to adap-
s k
The above texture cue captures an AR even if other cues tively combine multiple attention cues like intensity, color
like intensity and color fail. There are several situations in and texture. Consider an image divided into blocks, each
which the AR is captured using texture as an aid, e.g. tex- Patch A
ture foreground in non-textured background, non-textured
foreground in texture background, regular texture in a ran-
dom textured background, random texture in regular tex- Patch B
tured background etc. Figure 1 shows two examples where
texture is useful as a cue to detect salient objects that capture Patch C
Authorized licensed use limited to: Nanyang Technological University. Downloaded on April 25, 2009 at 09:59 from IEEE Xplore. Restrictions apply.
contrast measure (equation (5)) is suppressed if the patch Original Image
and patch B is from the true AR of the antelopes. Fig- Figure 3: The proposed local context suppression process
ure 2(b) shows eigen ellipsoids corresponding to each patch.
For each patch, the ellipsoid is centered at the mean of the
feature contrast vectors over the neighborhood of the patch.
Each axis of the ellipsoids points towards the eigen vectors 4. Experiment Results
of the ACCM while their semi-radii are proportional to the We demonstrate the efficacy of the texture attention cue and
eigen values. As seen from the figure, patch C has small new feature combination scheme on images selected from
variance along all the axes of the ellipsoid while patch A the Corel Photo Library. The images were selected in such
has small variance along two of its axes. However, patch a way as to ensure that they contained ‘true ARs’ as per-
B has large variance along all the 3 axes indicating higher ceived by the human visual system. However, we do real-
discriminating ability with respect to its neighborhood and ize that a ‘true AR’ is a subjective phenomenon. As men-
should therefore belong to a true AR, which indeed it does. tioned earlier, we consider 3 cues, the contrasts of intensity
Thus, it is required that the contribution to the saliency map (I = (R + G + B)/3), color (hue and saturation in the HSV
of patches A and C should be suppressed while that of patch space) and the proposed texture contrast. The size of an At-
B should be enhanced. The suppression Qp factor (SF) for tention Patch was 8 × 8. To determine the texture map, the
patch (i, j) is obtained as τ (i, j) = u=1 λ̄u where the Gabor Wavelet Transform used 4 scales and 6 orientations.
λ̄’s are sorted in ascending order and the parameter p con- The parameter p to control the SF was chosen as 2. Figure
trols the degree of suppression. The saliency value S(i, j) 4 (a) and (b), respectively, show the original images and the
for patch (i, j) is obtained in two steps: first the multiple at- visual ARs detected by the proposed method. We compare
tention cues or the contrast maps are linearly combined and the saliency map with those obtained using the algorithms
the result is modulated by the SF as in [5] and [3] whose results are shown in Figure 4 (c) and (d)
X
k respectively. It is evident that the proposed method is able
S(i, j) = τ (i, j) × F Vu (i, j) (6) to capture the visual AR better than other two methods.
u=1 For an objective evaluation, we manually segmented out
Figure 3 shows the steps leading to the final saliency map the salient object(s) and defined an Average Discrimination
of the image shown in Figure 2(a). The intensity and color Ratio (ADR) as
contrast maps are obtained using equation (5) and the tex- P
( (i,j)∈ϕ S(i, j))/|ϕ|
ture contrast map is obtained as described in Section 2. ADR = P P
( (i,j)∈ϕ S(i, j))/|ϕ| + ( (i,j)∈ψ S(i, j))/|ψ|
The linear combination of the contrast maps is implemented
(7)
simply as their sum and the SF for each patch is displayed
where |ϕ| and |ψ| represent the cardinality of the set of pix-
as an image with darker regions representing high SF and
els belonging to the salient region and the non salient re-
brighter regions representing low SF. The product of the
gions, respectively. Since the true ARs must contain high
combined map and the SF yields the final saliency map
saliency, the higher the ADR, better is the detection of the
which contains the true AR. Note that both the color and
true AR. We added White Gaussian Noise to the images
texture contrast maps have been able to indicate the true
and compare the ADR’s obtained using our method with
AR to some extent. However, the former has more spurious
those in [5] and [3]. Figure 5 shows the plot of ADR with
ARs than the latter. Moreover, using the proposed SF, these
increasing variance of noise for each of the methods. The
spurious regions have been successfully removed.
Authorized licensed use limited to: Nanyang Technological University. Downloaded on April 25, 2009 at 09:59 from IEEE Xplore. Restrictions apply.
proposed method has significantly higher ADR of above Average Discriminanation Ratio
1
0.9 compared to the others, implying that the saliency value Context Suppression Model
Itti’s VAM [5]
CSI [3]
of the non salient regions is close to zero. It is interesting to
note that all the methods perform consistently with increase 0.9
0.8
0.75
0.7
0.65
1 2 3 4 5 6 7 8 9 10
Gaussian Noise Variance
References
[1] L.Q. Chen, X. Xie, X. Fan, W.Y. Ma, H.J. Zhang, and H.Q.
Zhou. A visual attention model for adapting images on small
displays. ACM Multimedia Systems Journal, 9(4):353–364,
2003.
[2] C. Grigorescu, N. Petkov, and M.A. Westenberg. Contour
and boundary detection improved by surround suppression
of texture edges. Journal of Image and Vision Computing,
22(8):583–679, 2004.
[3] Y. Hu, X. Xie, W.Y. Ma, L.T. Chia, and D. Rajan. Salient
region detection using weighted feature maps based on the
human visual attention model. In Proceedings of the Fifth
IEEE Pacific-Rim Conference on Multimedia, Tokyo Water-
front City, Japan, November 2004.
[4] L. Itti and C. Koch. A comparison of feature combination
strategies for saliency-based visual attention systems. In Proc.
SPIE Human Vision and Electronic Imaging IV (HVEI’99),
volume 3644, pages 473–482, San Jose, CA, 1999.
(a) (b) (c) (d)
[5] L. Itti, C. Koch, and E. Niebur. A model of saliency-based
Figure 4: Experiment Results (a) Original Image; Saliency visual attention for rapid scene analysis. IEEE Transactions
Maps using (b) proposed method (c) Itti’s Model [5] (d) CSI on Pattern Analysis and Machine Intelligence, 20(11):1254–
[3] 1259, 1998.
[6] Y.F. Ma and H.J. Zhang. Contrast-based image attention anal-
ysis by using fuzzy growing. In Proceedings of the Eleventh
5. Conclusions ACM International Conference on Multimedia, Berkeley, CA,
USA, 2003. ACM Press.
In this paper, we propose a useful attention cue of texture in-
formation and design a general context suppression model [7] B. S. Manjunath and W.Y. Ma. Texture feature for browsing
and retrieval of image data. IEEE Transactions on Pattern
inspired by the mechanism of non-classical receptive field
Analysis and Machine Intelligence, 18(8):837–842, 1996.
inhibition of the primate visual cortex for attention detec-
tion. This model analyzes the context contrast variance [8] D. Walther, L. Itti, M. Riesenhuber, T. Poggio, and C. Koch.
adaptively to find the potential similarity of context contrast Attentional selection for object recognition - a gentle way.
and uses this factor to highlight the true attention regions Lecture Notes in Computer Science, 25(25):472–279, 2002.
and suppress spurious ARs simultaneously. To evaluate the [9] X.J. Wang, W.Y. Ma, and X. Li. Data-driven approach for
performance of the proposed method, we introduce a cri- bridging the cognitive gap in image retrieval. In Proceed-
teria to test its discriminating power between salient object ings of 2004 IEEE International conference on Multimedia
regions and background. In the future work, we will inte- and Expo, Taipei, 2004.
grate scale-space information to our model.
Authorized licensed use limited to: Nanyang Technological University. Downloaded on April 25, 2009 at 09:59 from IEEE Xplore. Restrictions apply.