Professional Documents
Culture Documents
2012
VCJ11310.1177/1470357212446408Boeriis and Holsanova: Tracking visual segmentationVisual Communication
visual communication
ARTICLE
MORTEN BOERIIS
University of Southern Denmark, Odense, Denmark
JANA HOLSANOVA
Lund University, Sweden
ABSTRACT
This article introduces a new methodology for deriving the dynamics of visual
segmentation in relation to the underlying cognitive processes involved. The
method combines social semiotics approaches to visual segmentation with
eye-tracking studies on authentic image viewing and simultaneous image
description. The authors thesis is that visual segmentation suggested by
the social semiotic approach is traceable in the behaviour of the viewers
who perceive images while creating meaning. From this perspective, visual
zooming is seen as both perceptually, cognitively, grammatically and analyti-
cally relevant. The interdisciplinary approach developed in the article pres-
ents new perspectives on the ways images are segmented and interpreted.
KEYWORDS
cognition dynamic rank scale eye tracking image description image
viewing social semiotics visual segmentation
1 INTRODUCTION
The notion of visual segmentation, image segmentation or scene segmenta-
tion has been discussed in various research areas. This has been done in visual
perception (in terms of natural partitioning of a given image into meaningful
regions and objects conducted by the viewers visual and cognitive system; see
Henderson, 2007; Holsanova 2001, 2008), in computer vision and image pro-
cessing (in terms of an automated partitioning of a given image into mean-
ingful regions and objects conducted by feature extraction, edge detection,
and region extraction; see DeLiang Wang, 2003) and in social semiotics (in
SAGE Publications (Los Angeles, London, New Delhi, Singapore and Washington DC:
http://vcj.sagepub.com) Copyright The Author(s), 2012.
Reprints and permissions: http://www.sagepub.co.uk/journalspermissions.nav/
Vol 11(3): 259281 DOI 10.1177/1470357212446408
terms of a contextually and functionally based description and partitioning of
images as a communicative resource for meaning-making; see Boeriis, 2009,
2012; OToole, 2011[1994]).
The human visual system extracts and groups similar features and
separates dissimilar ones, segments the scene into coherent patterns, identifies
objects, distinguishes between figure and ground, perceives object properties
and parts, and recognizes perceptual organization (Palmer, 1999). Our ability
to perceive figures and meaningful wholes instead of collections of lines have
been described in terms of gestalt principles of (a) proximity, (b) similarity,
(c) closure, (d) continuation, and (e) symmetry (Khler, 1947). For instance,
the proximity principle implies that features and objects placed close to each
other appear as groups rather than a random cluster. The similarity principle
means that there is a tendency for elements of the same shape, brightness
or colour to be seen as belonging together. The principle of closure implies
that elements tend to be grouped together if they are parts of a closed figure.
The continuation principle means that units that are aligned appear as inte-
grated perceptual wholes. Finally, the symmetry principle implies that regions
bounded by symmetrical borders tend to be perceived as coherent figures.
In the following, we will focus on visual segmentation in social semi-
otics and in visual perception and cognition. The aim of our case study is to
compare the social semiotics approach to visual segmentation (Boeriis, 2009,
2012) to eye-tracking studies of authentic image viewing and simultaneous
image description (Holsanova, 2001). In particular, we will first apply cate-
gories and rank mechanisms from the social semiotic framework to a com-
plex image. We will then compare the results of the semiotic analysis with the
results of the eye-tracking study of authentic viewing and description of the
same image. Our thesis is that visual segmentation suggested by the social
semiotic approach is traceable in the behaviour of the viewers who perceive
visuals while creating meaning. It becomes visible in visual fixation patterns
accompanied by a simultaneous image description. A comparison of semi-
otic analysis with eye-tracking measurements is very useful (Holsanova et al.,
2006). By combining these two perspectives and methodologies, we hope to
come to a better understanding of the dynamics of visual segmentation.
2 BACKGROUND
2.1 Visual segmentation and the social semiotic
approach
The social semiotic approach describes communicative resources as inter-subjec-
tive emergent rather than normative rule governed phenomena. The resource
systems emerge from social use and, as such, are regulated by socially conven-
tionalized communicative aptness (Kress, 2010: 55). The individual visual text
is the result of intentional strategic choices from the resource systems regu-
lated by the given context. Consequently, an image is seen as a communicative
3 METHOD
Firstly, we will describe visual segmentation based on a social semiotic
approach (Baldry and Thibault, 2006; Boeriis, 2012; OToole, 1994). Secondly,
we will present the cognitive approach and introduce the method of combin-
ing eye-movement analysis with simultaneous image descriptions (Holsanova,
2001).
Figure 3 Complex picture: the motif comes from a childrens book by Sven Nordqvist
(1990). Reproduced by kind permission from Sven Nordqvist.
Each line in the transcript represents a new verbal focus expressing the con-
tent of active consciousness. Verbal focus is usually a phrase or a short clause,
delimited by prosodic and acoustic features: it has one primary accent, a
coherent intonation contour, and is usually preceded by a pause or hesitation
(Holsanova, 2001: 15ff.). It implies that one new idea is formulated at a time
and active information is replaced by other, partially different information
at approximately two-second intervals (Chafe, 1994). Several verbal foci are
clustered into superfoci (for example, a summarizing superfocus or a list of
items in the above example, delimited by lines). A verbal superfocus is a coher-
ent chunk of speech, typically a longer sentence, consisting of several foci con-
nected by the same thematic aspect and having a sentence-final prosodic pat-
tern. Superfoci can be conceived of as thresholds into a new complex unit of
thought (Holsanova, 2008: 8ff.).
Both the visual scanpath (Figure 4) and the verbal transcript illus-
trate the result of 7 seconds of viewing and description. They can be seen as a
functionally delimited segmentation unit. If we want to look at the process of
image viewing and image description in detail, however, we need to use a dif-
VERBAL
in the middle is a tree with one with three birds doing different things
time
4 EMPIRICAL EXAMPLES
In this section, we will show examples of how the categories from the social
semiotic framework can be applied to the complex image that served as a
stimulus in the cognitive eye-tracking study (see Figure 3). We will also show
examples of units created and delimited by informants during authentic image
viewing and image description (Holsanova, 2001, 2008). This section will be
concluded by a comparison of the social semiotic and cognitive approaches.
5 RELEVANCE
The results of this explorative interdisciplinary study of visual segmentation
are relevant for further investigation of the rank scale in visual social semiot-
ics in general and for multimodal rank scale theory in particular. They are of
importance for further investigations of visual segmentation in scene percep-
tion and give new perspectives on the discussion of conceptual categories
and on the role of bottom-up and top-down processes. The results can also
contribute to future research on grammatical/structural functions that may
not as yet be empirically verifiable, as well as to research on circumstantial
meaning.
The ranking mechanisms and methods presented here can be applied,
for instance, to figurative images, photographs, illustrations, graphics and lay-
out on two-dimensional visual canvas. Even theories of moving images may
benefit from the dynamic approach. Other modalities such as in three-dimen-
sional or auditory communication, however, will probably employ other sys-
tems, rank scales and ranking mechanisms.
The combination of the social semiotic and cognitive approaches
is beneficial in several ways. Empirical data can verify or refute theoretical
assumptions and categories deduced from visual grammar and perhaps sug-
gest new issues that have not yet been addressed. This may induce consider-
ations about the aptness of certain grammatical categories from which they
originally stem.
6 LIMITATIONS
This is merely an explorative case study and it is not based on hundreds of
viewers or hundreds of images. The Pettson image cannot be considered rep-
resentative of images in general, and therefore all tendencies revealed by this
investigation have to be tested with a whole range of other images and with
larger groups of viewers. Also, there is a need for carefully designed controlled
experiments that would investigate in a systematic way the role of factors such
as individual differences and expertise, the role of the task or instruction and
the role of bottom-up and top-down processes for visual segmentation.
The fact that the social semiotic dynamic visual rank scale applied
here is only a first tentative proposition for a rank scale is also a limitation.
Moreover, the dynamic rank scale needs further discussion and development.
Another potential limitation may be that the eye-tracking and verbal proto-
cols are restricted by limitations in the respondents knowledge and awareness.
7 DISCUSSION
This study compares the social semiotic model of visual segmentation with
eye-tracking studies of image viewing and simultaneous image description.
Our main thesis was verified as we found that visual segmentation was trace-
able in the behaviour of viewers who perceive visuals while creating meaning.
We found the concept of the visual zoom applicable in both a social semiotic
and a perceptual cognitive approach. Also, we found quite coincident seg-
mentation categories (rank scale levels) and ranking mechanisms in the two
approaches, as many of the semiotic categories were demonstrated by eye-
tracking and verbal protocols. The inspiration of gestalt theory was one clear
common denominator which facilitated the unison of the cognitive and social
semiotic approach to visual segmentation.
The different approaches revealed different perspectives on the same
phenomenon and there were of course discrepancies in what was emphasized
by the two approaches. Certain distinct aspects that appear important in the
empirical study of image perception were not as accentuated in the social
semiotic approach (and vice versa). Among these discrepancies were: (1) the
temporal aspects of image perception and the dynamics of visual segmenta-
tion; (2) the role of individual differences and expertise; (3) the role of the
context, task, instruction or goal; (4) the role of an implicit model reader; (5)
the role of interpersonal segmentation; (6) the role of paradigmatic relations
within system resources; and (7) the understanding of salience. Space con-
straints prohibit us from further elaboration of these discrepancies, but they
would all be very interesting areas for future investigation.
The semiotic versus the reception focus did yield interesting perspec-
tives on each other. From both perspectives and in unison with both we
found that rank scale segmentation plays a very important role in visual
meaning-making. This indicates that the two approaches can indeed sup-
port each other, and in combining the two perspectives and methodologies,
we came to a better understanding of the dynamics of visual segmentation
and the underlying cognitive processes. Even though this is merely a first
ACKNOWLEDGEMENTS
We would like to thank Kay OHalloran, Anders Bjrkvall, Roger Johansson
and the Eye Tracking Group at Lund Humanities Lab for their comments on
previous versions of the manuscript. The work has been supported by the
Linnaeus Center for Thinking in Time: Cognition, Communication and
Learning (CCL) at Lund University, funded by the Swedish Research Council
(grant no. 349-2007-8695). The work has also been supported by the Faculty of
Humanities and the Institute of Language and Communication at University
of Southern Denmark.
REFERENCES
Baldry, A.P. and Thibault, P.J. (2006) Multimodal Transcription and Text
Analysis. London: Equinox.
Bateman, J.A. (2008) Multimodality and Genre: A Foundation for the Systematic
Analysis of Multimodal Documents. London: Palgrave Macmillan.
Boeriis, M. (2008) Mastering Multimodal Complexity, in N. Nrgaard (ed.)
Systemic Functional Linguistics in Use. Odense Working Papers in
Language and Communication, Vol. 29, pp. 21936. Odense: University
of Southern Denmark.
Boeriis, M. (2009) Multimodal Socialsemiotik og Levende Billeder, PhD
thesis, University of Southern Denmark, Odense.
Boeriis, M. (2012) Tekstzoom om en dynamisk funktionel rangstruktur
i visuelle tekster, in T. Andersen and M. Boeriis (eds) Nordisk
Socialsemiotik - multimodale, pdagogiske og sprogvidenskabelige
landvindinger, pp/ 131153. Odense: University Press of Southern
Denmark.
Buswell, G.T. (1935) How People Look at Images: A Study of the Psychology of
Perception in Art. Chicago: University of Chicago Press.
Chafe, W.L. (1994) Discourse, Consciousness, and Time: The Flow and
Displacement of Conscious Experience in Speaking and Writing. Chicago:
University of Chicago Press.
DeLiang Wang (2003) Visual Scene Segmentation, in M.A. Arbib (ed.) The
Handbook of Brain Theory and Neural Networks, 2nd edn, pp. 121519.
Cambridge, MA: MIT Press.
Dewhurst, R. et al. (2012) It Depends on How You Look at It: Scanpath
Comparison in Multiple dimensions with MultiMatch, a Vector-Based
Approach, Behavioral Research Methods, 42(2). Published online 31
May 2012. DOI 10.3758/s13428-012-0212-2.
BIOGRAPHICAL NOTES
MORTEN BOERIIS is an Assistant Professor at the Institute of Language
and Communication at the University of Southern Denmark. He specializes
in multimodality, visual communication, moving images and business com-
munication, and teaches various courses at the Department of International
Business Communication.
Address: Institute of Language and Communication, University of Southern
Denmark, Campusvej 55, DK-5230 Odense M, Denmark. [email: boeriis@
language.sdu.dk]