Professional Documents
Culture Documents
Music Perception
Summer 1992, Vol. 9, No. 4, 427~438
426
Ken'ichi
BY THE
UNIVERSITI"
REGENTS
Of
OF
THE
CALIFORNIA
Miyazaki
F. THOMPSON
LOLA L. CUDDY
1
1
harmonized music, influences on the perception of key structure and
key change are notably complex. Both melodic and harmonic structure
conrribute to a lisrener's sense of key. It is difficult to isolate their separare
influences, however, beca use the tonal implications of -i:nelody and harmony are highly correlared.
In a previous repon, we addressed this problem by investigating a strict
hierarchical description of key (Thompson & Cuddy, 1989). In this description, a melodic line implicares key structure by first implicating an
1ti
~
B.
428
William
F. Thompson
'
429
Method
The merhod described refers to experimentation wirh original and Simplificarion (1)
sequences. Procedures far Simplification (2) sequences were similar (Thornpson & Cuddy,
1989).
430
William
F. Thompson
431
LISTENERS
Four groups of Iisreners, each consisring of 20 undergraduare srudenrs from rhe Universiry of Queensland,
Australia, were rested, Listeners were selecred from a firsr-year
subjecr pool and were given course credir for participaring. The participants had little or
no formal training in tradirional music rheory, bur all lisrened to classical music on a regular
basis, or currenrly played a musical insrrument. Ali subjecrs reporred normal hearing.
APPARATUS AND STIMULI
Tones were rriangular waveforms produced by a Yamaha DX-11 synthesizer, conrrolled
by a Maclnrosh SE-20 compurer. The synrhesizer was ser to the sysrem of equal rernperamenr wirh A4 equal to 440 Hz. Sequences were prepared wirh rhe music sofrware
package "Professional Composer" and presenred under the control of the sofrware package
"Professional
Performer." The tempo of each presentation was ser to 120 quarter nores
per minure.
Lisreners were rested in soundproof boorhs. Tones and chords were delivered binaurally
rhrough Sennheiser headphones (HDH 424), and responses were enrered on rhe keyboard
of a cornpurer terminal. Befare the experimental session began, each lisrener was allowed
to adjusr che average SPL ro a comforrable lisrening leve] wirhin rhe range 65 ro 75 dB
SPL.
CHORALE SEQUENCES
Ten phrases were excerpred from rhe complere set of Bach chorales (Leuchrer, 1968).
The original sources are lisred in Thompson and Cuddy (1989, appendix). The 10 excerprs
provided rwo examples of each of five modularion condirions. The five condirions were
as follows: nonmodularing
[Condirion NM], modularing ro the key of the dominanr
[Condirion M(V)], modularing to the key of the subdominanr [Condirion M(IV)], modularing ro rhe key of rhe superronic [Condirion M(II)], and modularing ro rhe key of the
flarrened sevenrh [Condirion M(VIIb)].
Ali excerprs ended with a perfecr cadence to the tonic chord of the final key. Two
versions of each sequence were used in rhe invesrigarion: the original excerprs and a
simplified version of the excerprs [Simplification (1), prepared by Professor F. Lerdahl].
The simplified (1) version of each sequence consisred of eighr chords with no ornamental
or passing notes. For each sequence, rhe key strucrure, the roor of the firsr chord, and rhe
roors of rhe final rwo chords were always rhe same in the original and simplified versions.
However, harmonic progressions were sornetimes altered in rhe simplified (1) sequences
in order to equare rhe poinr of modularion across ali sequences.
Figure 1 shows the original versions of rhe 10 chorale sequences in musical noration.
Figure 2 shows rhe simplified (1) versions of the 10 sequences. The rypes of key movemenr
displayed in Figures 1 and 2 are as follows: nonmodularing-Sequences
1 and 2, modularing one srep on rhe cycle of fifrhs ro rhe dominanr or to rhe subdominanr-Sequences
3, 4, 5, and 6, and modulating rwo sreps on rhe cyde of fifrhs to the superronic orto the
flarrened sevenrh-Sequences
7, 8, 9, and 10.1
PROCEDURE
Subjecrs were randomly assigned ro one of four groups. Presenrations for rhe four
groups were as follows: Croup 1-harmonic
sequences excerpted from Leuchrer (1968),
Group 2-single
voices of rhe larrer sequences, Group 3-harmonic
presenrarions of Sirn-
l. Sequences 3 and 4 of Thompson and Cuddy (1989) were dropped for reasons explained in rhat paper, and rhe sequences of chis paper have been renuribered accordingly.
Fig. l. The original versions of the 10 sequences, excerpted frorn Bach Chorales (Leuchrer,
1968).
plificarion (1) sequences, Group 4-single
voices of rhe latrer sequences. Each tria! consisred of an inirial melodic partern of five quarrer nores, followed by a pause equal ro rwo
quarrer notes, and then a harmonic or single-voice presenrarion. The inirial melodic partern
oudined rhe tonic rriad of rhe inirial key of rhe sequence and was included ro give rhe
lisrener a srrong sense for rhe inirial key. Presenred in ascending order, rhe five notes of
rhe parrem were ronic, tonic one octave above rhe firsr tone, medianr, dominant, and ronic
rwo ocraves above rhe firsr rone. For harmonic presenrations, each harmonic sequence was
433
432
of rhe scale. Lisreners were inforrned rhar rhere were no righr or wrong answers and thar
rhey should try ro use rhe entire range of rhe response scale.
C :: 1:; ; ; I~; ;
;,
1
1
J- J....... ~ J J~ J- J- J '
Table 1 displays mean ratings of perceived key distance for three types
of key structure and three versions of the sequences. The three types of
key structure in the table are nonmodulating (Nonrnod), modulating one
step on the cycle of fifths (Mod 1 step), and modulating rwo steps on the
cycle of fifths (Mod 2 steps). The upper part of the table shows mean
ratings for four-voice harrnonic presentations; the lower part shows mean
ratings for single voices averaged across voices (soprano, alto, tenor, bass).
Table 1 reveals similar patterns of resu!ts for each of the three versions.
Single voices, on the average, conveyed as much inforrnation about key
change as four-voice harmonic presentations: As the theoretical distance
on the cycle of fifths increased, ratings of perceived distance increased by
about the same amount for four-voice harmony and for single voices.
Analysis of variance yielded strong effects of modulation distance and no
interactions between modulation distance and version of the sequences.
For four-voice harmonic presentations, nonmodulating sequences were
rated lower than modulating sequences [F(l, 57) = 57.19, p < .01], and
key changes of one step on the cycle of fifths were associated with lower
ratings than key changes of two steps [F(l, 57) = 97.18, p < .01]. Similarly, for single-voice presentations, nonmodulating sequences were rated
lower than modulating sequences [F(l, 61) = 190.93, p < .01], and key
changes of one step were associated with lower ratings than key changes
TABLE 1
Mean Ratings of Key Distance for Four-Voice Harmony
and Single Voices
Versin
Listeners were inforrned rhar rhey should rate rhe disrance berween the first and final
key of each presentation on a scale of one ro seven. For listeners not familiar with a formal
definition of key, an explanarion was provided. This explanarion included reference ro the
scale, do-re-me-fa-sol-la-ri-do, and ro rhe sense of srabiliry associared with rhe firsr note
Four-voice harmony
Nonmod
Mod 1 step
Mod 2 steps
Mean of four single voices
Nonmod
Mod 1 step
Mod 2 sreps
Original
Simplified (1)
Simplified (2)
2.58
3.03
4.46
2.45
3.58
4.50
2.54
3.32
4.45
2.65
3.38
4.38
2.49
3.60
4.50
2.64
3.23
4.16
434
William F. Thompson
of two steps [F(l, 61) = 154.20, p < .01]. Thus, ratings of key distance
were as reliable for single voices as they were for full hannonic sequences.
ASYMMETRY OF MODULATION
JUDGMENTS
TABLE 2
DIRECTION
Table 2 displays mean ratings assigned to soprano, alto, tenor, and bass
voices, for each of the three versions, averaged across modulating sequences only. For single-voice presentations, the three versions differed
with respect to the amount of key change conveyed by the four voices
contained in rnodulating sequences. For the original versions, rarings of
key distance were lower for soprano voices than for other voices. For the
rwo simplified versions, rarings of key distance for the soprano voice of
435
Voice
--Original
Simplified (1)
Simplified (2)
. Soprano
Alto
Tenor
Bass
2.78
3.57
3.87
3.88
3.44
3.34
4.50
4.59
3.70
4.38
4.59
3.87
most sequences were similar to ratings of key distance for other voices.
In orher words, the degree of key change conveyed by single voices was
less balanced across voices in the original versions than in the simplified
versions. These differences were supported by an overall interaction between Voice x Sequence-type x Version [F(24, 732) = 5.42, p < .01].
Not surprisingly, voices that introduced the new note or notes involved
in a key change were .most informarive that a key change had occurred.
For example, ratings of the soprano voice of Sequence 9 were much lower
for the original version (mean, 2.45) than for the simplified (1) version
(mean, 5.15) [t(19) = 5.88, p < .001]. This difference appears to be related to the presence of accidentals in the voice: the original soprano voice
involves no accidentals, but the simplified (1) soprano voice involves two
accidentals (i.e., notes from the new key) in the last bar. For the alto of
the same sequence, mean ratings for the original and simplified (1) version
were 5.55 and 3.25, respectively [t(19) = 4.67, p < .001]. Again, the difference appears to be relared to the presence of an accidental. In the
original alto voice, an accidental is.introduced by the new key at the
penu!timate note. In the simplified (1) alto voice, there are no accidentals.
COMPARISON
As noted in Table 1, mean ratings of key distance for four-voice harmony and for single voices increased by similar amounts as theoretical
distance on the cycle of fifths increased. However, exarnination of ratings
for the individual sequences within a given modulation distance revealed
that judgments of harmonic presentations and judgments of single-voice
presentations did not always correspond. The musical changes made to
the original sequences to creare simplified (1) sequences did not always
have the same effect on judgments of harmonic presenrations as rhey did
on judgments of single-voice presentations. As noted earlier, fer example,
rarings of original harmonic sequences contained a directional asymmetry
that was significantly reduced in ratings of simplified (1) harmonic se-
436
William
F. Thompson
;,,
<
>
;,,
6~
Original
Simplification (1)
~
o
""
o
z
:::
<
12]
437
e::
-c
;,,
::;:
Single
Volees
PRESENTATION
CONDITION
Fig. 3. Mean rarings of key disrance far original and simplified (1) versions of Sequence
10, far harmonic and single-voice presentations.
voices and full harrnony: musical details that may reduce key movement
in a harrnonic context may actually enhance key movement in single
vorces,
Conclusions
11
1
Music Perception
Summer 1992, Vol. 9, No. 4, 439-454
438
() 1992
BY THE
UNIVERSITY
REGENTSOF THE
OF
CALIFORNIA
References
Bharucha, J. J. Anchoring effecrs in music: The resolurion of dissonance. Cognitive Psychology, 1984, 16, 485-518.
Bharucha, J. J. Music cognirion and perceprual facilirarion: A connecrionisr framework.
Music Perception, 1987, 5, 1-30.
Cuddy, L. L, Cohen, A. J., & Mewhorr, D. J. K. Perceprion of structure in shorr melodie
sequences. [ournal of Experimental Psychology: Human Perception and Performance,
1981, 7, 869-883.
Cuddy, L. L., Cohen, A. J., & Miller, J. Melody recognirion: The experimental application
of musical rules. Canadian [ournal of Psychology, 1979, 33, 148-157.
Cuddy, L. L., & Lyons, H. l. Musical parrern recognirion: A comparison of listening to
and srudying tonal strucrures and tonal arnbiguiries. Psychomusicology, 1981, 1, 1533.
Dowling, W. J. Scale and comour: Two componenrs of a rheory of memory for melodies.
Psychological Review, 1978, 85, 341-354.
Krumhansl, C. L. The psychological represenrarion of musical pitch in a tonal conrext,
Cognitive Psychology, 1979, 11, 346-374.
Krumhansl, C. L., Bharucha, J.]., & Castellano, M. A. Key disrance effecrs on perceived
harmonic srructure in music. Perception and Psychophysics, 1982, 32, 96-108.
Krumhansl, C. L., Bharucha, J.]., & Kessler, E. J. Perceived harmonic srructure of chords
in three related musical keys. [ournal of Experimental Psychology: Human Perception
and Performance, 1982, 8, 24-36.
Krumhansl, C. L., & Kessler, E. J. Tracing rhe dynamic changes in perceived tonal organizarion in a spatial represenrarion of musical keys. Psychological Review, 1982, 89,
344-368.
Lerdahl, F. Tonal pirch space. Music Perception, 1988, 5, 315-349.
Leuchrer, E. (Ed.). J. S. Bach (386 Chora/es), Buenos Aires: Recordi Americana, 1968.
Schmuckler, M. A. Expecrarion in music: !nvesrigarion of melodic and harrnonic processes,
Music Perception, 1989, 7, 109-150.
Thompson, W. F., & Cuddy, L. L. Sensiriviry to key change in chorale sequences: A
comparison of single voices and four-voice harmony. Music Perception, 1989, 7, 151168.
lntroduction
Many incompatible theories about temporal perception and memory
exist, which explain a number of phenomena well, but fail to predict
others. A common rheoretical basis for such work would be desirable.
Connectionism might be an attractive paradigm in the search for such a
basis, but most of its models lack compositionality. This means that the
model as a monolithic whole might perform well, but it is impossible to
decompose its complex behavior into meaningful smaller parts. Chandrasekaran (1990) argues that the cornposability is a condition fer successful cognitive modeling, even in the connectionist paradigm. In Desain
(1990), the behavior of a subsyrnbolic (connectionist) model of temporal
quantization was described such that it could be cornpared with an incompatible syrnbolic model from the traditional artificial imelligence paradigm. The paper concluded with an abstracrion of the behavior of the
quantizer in the form of an "expectancy of events" with a temporal pattern
as prior context. Expectancy turned out to be (de)composable, which
makes it possible to base a theory of perception of complex stimuli on
a simple model for the perception of their constituting components. Be-
3. This research was supporred by a Special Projects Grant from the Universiry of
Queensland to rhe firsr aurhor, and by research awards from rhe Natural Sciences and
Engineering Research Council of Canada and the Advisory Research Cornmirree of
Queen's Universiry to rhe second aurhor.
4. We thank Fred Lerdahl, Beverly Cavanagh, and Keirh Hamel for music rheoretical
analysis, discussion, and advice. Helpful commenrs and suggesrions were also provided
by Caro! Krumhansl.
Requesrs for reprints may be senr to Perer Desain, NICI, Nijmegen Universiry, P.O.
Box 9104, 6500 NE Nijmegen, The Nerherlands.
439
Music Perception
Spring 1991, Vol. 8, No. 3, 217-240
UNtVERSln' OF CALIFORNIA
ERNST TERHARDT
Music Perception (ISSN 0730-7829) is
published quarrerly in Fall, Wimer,
() "":: i: li
.. , ; .-Y.'~-!:
.. '.I':
0 .....
Ptio
\
--.--:::_""2t_":\
.v..,;::.
r-:._,\l (\ /) 7.::
ti
.r.; .0~.;, ,.(,... '"
3'2>C
,,,,,,
.;_;)._'1.J/5
, ,
~--,. -~
3 _3
.cn,u
Introduction
I regard ir a rnistake if the rheory of consonance is considered as the
essenrial basis for the theory of music, and I had felt that I had expressed rhis in the book wirh sufficienr clariry, The essenrial basis of
music is melody.
H. von Helmholtz
foreword to 3rd ed., The Sensations o( Tone (1870)
217
218
Ernst Terhardt
219
to the aforementioned
analogy between melody and figure-background
distinction.
On low levels of the sensory hierarchy, where intermodal particularities
are most pronounced, analogies do not so easily become apparent bur
require a high amount of abstraction from agrear variety of emprica! data
and observations. If successful, that effort is rewarded by providing borh
an equivalent amount of information and a basis for better understanding
high-level processes.
In an attempt to take advanrage of those ideas, the presenr study puts
auditory perception of musical tones and chords into rhe general conceptual frame of sensory information processing. This requires critica!
discussion of sorne pertinent concepts such as hierarchy, discretization,
inforrnation, contour, illusion, ambiguiry, and sirnilarity. As a result, it will
become apparent that a number of basic principies of tonal music such
as pitch categorization, tone affinity, chroma, root, and tonality readily
emerge frorn natural principies of sensory informarion acquisition. Moreover, an explanation of the so-called tritone paradox will be suggesred.
Implications
of Hierarchical
Processing
220
Ernsr Terhardr
what irs objecrive details are. He poimed out that in a highly developed
sensory system, a huge amount of knowledge must be included even on
low levels; that is, knowledge about srructures, relationships, and constraints of physical srimulus pararneters rhat carry information on externa!
objecrs and events. Moreover, he pointed out rhat most of that knowledge
has emerged by biological evolution, that is, rhrough interaction wirh the
physical conditions of the externa! world. And he clearly expressed the
norion rhat rhis process essentially is equivalent to learning by tria! and
error by an individual organism. With these notions ir becomes apparent
rhar, for example, Gestalt principies such as proxirniry, closure, and common fare are not principies per se but have evolved through interaction
with the coriditions of rhe externa! world, that is, to respond optimally
to any externa! challenge. This implies that much can be learned about
perception by srudying those physical conditions and constraints and their
psychophysical effects. And it implies further that on low levels of the
hierarchy a considerable arnount of active, "intelligem" processing must
go on "autornarically,"
that is, unconsciously. Another implication is that
rhe question of whether the knowledge implemented in a sensory system
is innare or learned in rnany respects is of minor relevance- beca use in
either case ir has been acquired by trial-and-error imeracrion with the
externa! world.
Wirh rhe following list of principies, an attempt is made to express both
typical characreristics and constraints of sensory hierarchical processing:
Immediate processing: Sensory inforrnation processing begins
right at the lowest possible level, that of the peripheral sense
organ.
Open end: The hierarchy does not have a definite number of
levels but is open ended.
Recursion: The basic principies of inforrnation processing are
the same on all levels.
Distributed knowledge: The "knowledge" necessary for optima! processing is distributed on all levels so that on each leve!
the particular type of knowledge that is required for the job
is available.
Forward processing: Inforrnation processing is predominantly
forward, that is, from peripheral to central levels.
Autonorny: On a given leve!, the input that comes from the
preceding leve! is processed according to the knowledge available on that particular leve!. On a short-term scale (i.e., leaving
out long-term learning processes), processing is not affected
from any higher leve!.
Viewback: While (on a short-terrn time scale) decisions made
on a particular leve! cannot be changed from a higher leve!,
221
Ernsr Terhardt
Even a superficial inspection, from this point of view, of the visual and
auditory system reveals their "decisin machine character" to an overwhelming extent. Most remarkably, both in vision and audition, decisin
making actually begins right at the periphery, that is, in the eye's retina
and the ear's organ of Corti. Physiologically this is evident, for example,
in the transformation
of the stimulus into sequences of discrete neural
action potentials (nerve impulses) that propagare on discrere nerve fibers.
Psychophysically, ir is the phehomenon of contourization that provides
pertinent evidence. In this author's view, contourization provides a key
to a unifying concept of sensory inforrnation processing on any level.
222
Primary
Contour
and Spectral
Pitch
Perception of a visual Gestalt entirely depends on the existence of prirnary con tours. Without contour there is no Cestalr. Formation of contour
and synthesis of Gestalt are complementary, mutually dependent, active
processes. For the understanding of sensory information processing it is
a key notion that even prirnary contour (i.e., on the retina level) is not
trivial in rhe sense that it is fully determined by the stirnulus alone. What
a stimulus essentially produces on the eye's retina is continuous brightness
and color distribution.
Assignment of contours to such a distribution
requires active decisions on the part of the peripheral sensory system. That
type of decision can be regarded as the first step of inforrnation processing,
that is, cognirive abstraction. Ir is in this sense that cognition begins right
at the periphery (cf. the principie of immediate processing). And it is this
notion= in conjunction with the principies of forward processing and
autonomy-rhar
explains to a considerable extent the enorrnous efficacy
and speed of sensory information processing (e.g., Minsky, 1975). Visual
conrours are so irnportant because they represent the inost typical and
invarianr characteristics of externa! objects. Formation of contours implies
abstraction from many details of the incoming stimulus-in
particular
those that are dependent on intensity and color of illurnination-and
extracrs the typical shape of externa! objects.
Most remarkably, it is auditory spectral pitch (i.e., the pitch of part
tones) thar=-with respect to externa! "acoustical objects" -plays exactly
rhe same role. From rhe basic physical parameters of a sound-source signa!
(i.e., amplitudes, phases, and frequencies of part tones), it is only the
frequencies that are transmitted with highest fidelity; amplitudes and
phases ordinarily are to a considerable exrent corrupted.
Consider, for example, listening to a symphony in a concert hall. One
may without difficulty distinguish one or the other individual instrument,
Ernsr Terhardr .
kHz
5
"Sumrxier t
4
3
~~A=-.:~
~J'
.:;
.':,--:'.
1 -:--...,....:--:--:
,;.....;
Fig. l. Parr-tone parrern as a funcrion of time, of a solo soprano singer (firsr rhree notes
of "Summerrime" by G. Gershwin). Parr-tone amplitudes are coded in line rhickness. The
informarion displayed is sufficienr to synrhesize an audio signa! rhar is aurally almosr
indisringuishable from the original. For rechnical reasons, only rhe frequency band 0-5
kHz was analyzed. The diagram illusrrares rhe tonal informarion presenr on rhe firsr leve!
of abstraction, rhar is, primary audirory contour.
ir should be noted that these results apply to polyphonic music and mulrivoice speech, as well.
In addition to the analogy between visual prirnary contour and spectral
pitch, there exist a number of psychophysical phenomena that strongly
support the analogy and further reduce its arbitrariness. First, there are
sorne accornpanying effects such as contrast enhancement (Mach bands;
cf. Carterette, Friedman, & Lovell, 1969; Small & Daniloff, 1967; Summerfield, Haggard, & Foster, 1984; Viemeister, 1980); after-contours
(Fastl, 1986; Wilson, 1970; Zwicker, 1964); and the type of "illusion"
in which perceprions of shape, Iength, or direction of visual contours are
sysrernatically differenr from corresponding objective pararneters, its auditory equivalenr is subjective shift of specrral pitch (e.g., by superirnposed
noise), and octave enlargement of pure tones (Stumpf, 1965; Terhardt,
1971, 1989; Walliser, 1969; Ward, 1954).
225
226
- -, .. '!,e:,_:' . -,
>
Ernsr Terhardt
r
,:1 ~ :-
Fig. 2. Illustration of virtual conrours and virtual figures as an analogy ro virtual pirch
and roor,
227
1974). Although this theory originally was not explicitly based on the
intermodal analogies and general principies discussed here it readilv fits
into them. T aking in to account the aforernentioned
analogies berween
visual prirnary contour and spectral pitch, and between visual virtual
contour and virtual pirch, the virtual-pitch theory turns out to be a natural
part of the comprehensive concept of sensory information acquisition.
The significance of the virtual-pirch theory for music perception has
been found to extend far beyond the ptch of single musical tones. The
theory includes explanations for such basic musical phenoma as tone
affinity and pitch ambiguity (e.g., octave equivalence), octave stretch and
stretch of the tone scale, the root phenomenon
(Rameau's "basse fondamentale"), and equivalence of chord inversions. The theory, and its
algorithrnic implernentations,
can be recornmended as a too! for music
rheory (cf. Terhardt, 1974, 1978, 1979, 1982; Terhardt, Stoll, & Seewann, 1982a,b; Parncutt, 1988, 1989).
Both visual virtual contour and audirory virtual pitch can be regarded
as sarnples of the irnmediate, forward processing, autonornous tendency
of low cognirive levels to extraer straightforwardly
"what a stimulus
rneans." Of course, this requires "knowledge,"
that is, use of certain
reasonable criteria. In visual perception, rhose criteria are dependent on
what type and configuration of objects ordinarily would produce the given
type of stimulus. And the same applies to auditory perceprion. As was
outlined in the virrual-pitch theory, ir is the human speech signa! that
probably provides an irnportant reference for aura! evaluarion of tonal
sounds. As by physical reasons, voiced speech elements are composed of
harrnonic part tones, the low-level mechanisms of the audirory systern
operare on the presumption that this is so fer any sound. According to
the theory, rhis is the reason why audirory creation of virtual pitch consisrently obeys the principle of "subharmonic coincidence derection" (Terhardr, 1972, 1974).
That behavior is assumed to have been acquired and setrled either ar
an early age by an individual or through biological evolurion. As was
discussed earlier, the relationships between subjective pirch shifts and
interval stretch suggest that the forrner is the case, that is, learning in early
life. The assumption was made that development of the aura! mechanism
that creares virtual pitch is an essential part of the systern rhar processes
speech, that is, abstracts linguistic inforrnarion from the highly redundant
speech signa!. This conclusion is supported by experimental evidence
showing that aura! capabiliries to norrnalize phoneric characterisrics of
speech exisr in early infancy (Kuhl, 1979; Miller, Younger, & Morse,
1982). Moreover, ir has been established that a human fetus can hear
already severa! rnonths before birth (in particular, the rnother's voice).
With regard to general principles of biological development, ir is very likely
228
Ernst Terhardt
1.:1ce:1:iT 1dn.3
C ..
229
2.3b
that is prirnarily characterized by virtual pitches. By viewback, the lowlevel spectral pitches become recognized also. The virtual-pitch theory
accounts for that multiambiguity by assigning weights to the individual
pitches, either spectral or virtual. This is illustrated by Figure 3, where the
theoretical pitch distributions of two harmonic complex tones are shown.
The assumed fundamental frequencies are 440 Hz (upper diagram) and
220 Hz (lower). The 440-Hz tone is "higher in pitch" than the 220-Hz
tone, not only in the sense that its predominant pitch is higher, but in the
sense that the whole pattern is higher. So when a melodic sequence of
musical tones is considered, rhe pitch-time contour in rhe sense of Dowling's (1978) concept should be discussed in terms of the corresponding
sequence of pitch patterns rather than of single pitches.
As can also be seen in Figure 3, in the particular case of a 2: 1 fundamental frequency ratio there appears a type of similarity between the
two tones that is determined by identity of sorne pitches. When these two
tones are played one after the other, a portian of pitches of the first will
be "echoed" by the second. By visual analogy one can express this by
saying that the second Gestalt shares a number of contours with the first
one. Ir is apparent that this effect will promote a tendency to perceive the
second tone just as a replication of the first.
This is no less than a simple and straightforward explanation of octave
equivalence, that is, chroma. This explanation is very similar to that suggested already by Helmholtz (1954). However, an important new aspect
is provided by rhe presence of virtual pitches in the patterns. Whereas
Helmholtz had considered only rhe pattern of spectral pitches-which
extends from the fundamental frequency to severa! harmonics-the
present pitch patterns include a considerable number of virtual pitches that
are below the tone's fundamental frequency. This enhances the chance of
higher-level pirches of successive tones coinciding and thus provides considerable additional evidence for Helmholtz's conclusion.
Alrhough most pronounced for a 2: 1 frequency ratio, the effect of
similarity by coincidence of pitches applies to the ratio 2:3 as well, that
is, the fifth. For that ratio, the nurnber and weight of coinciding pitches
is less than for the octave. This accounrs borh for the existence of "fifth
equivalence" and for the fact that it is less pronounced than octave equvalence. Taking advantage of the described principies, one can design models for quantirative evaluation of tone affinities-another contribution to
a scientifically based music theory. The recent work of Parncutt (1988,
1989) provides solutions of rhat type.
.
While rhe above considerations were made on the basis of data provided
by the virtual-pitch theory, the ambiguous second-level pitch patterns of
individual musical tones have been experimentally verified. Figure 4 shows
a number of pitch histograms that were obtained by pitch marches to
harmonic complex tones with the fundamental frequencies indicated (from
PITCH-EOUIVALENT
2 55 FFo1~:0Hz220
1
1o
1-
62
~
Acquisition
FREO.
4~~
231
880Hz
IV
V V
LJ_ _
_._.__..___._
5 5
__
Fz Az 03 A3
[' FF=2~0Hz
~ ~hv v
~-~---,
A1.
As E5
tv
r v 1 r
A, 02 Az 03 A3
15 15
At. Es As
Fig. 3. Theorerical pirch parrerns, on rhe second leve! of abstracrion, of rwo harrnonic
complex rones with fundamental frequencies 220 and 440 Hz. Virtual pirches: v; specrral
pitches: s. Note ambiguiry of pirch and parrial coincidence of pirches in rhe rwo patterns.
Calcularion of pirches and pirch weighrs as described by Terhardt et al. (1982a).
r-
232
Ernsr Terhardt
Le
f-
<i::
u,
11
J\.,,
j
~
tf3 ~
::r:
u
=
A
/\
11
900
4-50
300
"
240
180
_,,..,
~'
1
600
120
"11
fund. frequency:
60 Hz
0.8
1.2
1.6 kHz 20
0.4
FREQUENCY OF MATCHING TONE
(1)
where n = O, 1,2,3, ... , and ( is a low base frequency, for example, in
the region below 100 Hz. The part tones either cover the enrire audible
frequency range (so that the lowest and highest merely fall below the
threshold of hearing); or they are limited by a bandpass filter to a certain
frequency band (for details see Shepard, 1964; Deursch, 1986).
In that type of complex-tone stimulus, pitch inforrnation is reduced in
such a way that the auditory pitch-evaluation mechanism cannot reasonably assign to it one dominant virtual pitch. What is heard is rather
a set of virtual pitches that are in an octave relationship to each other and
do not much differ in prominence. So, while the "chroma" or "pitch class"
of such a complex tone is well defined, its height is not. In a number of
experimenrs, Shepard (1964), Burns (1981), and Ohgushi (1985) have
dernonsrrated severa! aspects of the "circulariry" of perceived pitch that
is rypical for that type of stimulus.
0
Music Perception
pitches pertinent to each tone are lined up.vertically. The theoretical prominence (pitch weight) of the pitches is indicated by the area of black
squares. The ordinate is scaled in semitones, and the reference frequency
both for the pitch classes on the abscissa and the semitione scale at the
ordinate is 440 Hz for A4. The Shepard tones were assumed to be composed according to Eq. (1) with base frequencies f of 16.35-32.70 Hz
(depending on pitch class). The SPLs of part tones were assumed to be
50 dB, and part tones were included from f up to maximally 6 kHz.
One can see in Figure 5 that, for example, the Shepard tone with pitch
class C (first item at the abscissa; base frequency i; = 16.35 Hz) produces
pitches at 24, 36, 48, etc. semitones, corresponding to the pitch heights
C2, C3, C4, etc The area of squares indicares that the pitches C4 and C5
(262 and 525 Hz equivalent frequency) are most prominent. The second
pitch class, fl, was created with the base frequency t: = 23 .12 Hz, that
is, higher than that of C by a tritone ratio. One can see, however, that
this is virtually irrelevant where theregion of most prominent pitches is
concerned-the pirches corresponding to the base frequency and its second
harmonic are more or less ignored borh by the ear and the theory.
To explain the tritone paradox on that basis-and within the general
approach put forward in the present study-one merely needs take into
account that the pitch patterns shown in Figure 5 visualize 3: second-level
cognitive representation of the corresponding Shepard rones, Evaluation
of whether rhe tritone intervals C--:. fl, C# - G, etc., ar~ ascending or
descending is thus a challenge to the third cognitive level. Although the
criteria for that evaluation are not a priori evidenr, visual inspection of
the diagram (Figure 5)-silggests a reasonable solution, namely, to trace the
direction in which the rnost prorriinent pitches move from the first to the
second Shepard tone. When (arbirrariiy) tle two most prminent pitches
are selected, one arrives at the solution indicated by arrows. The lattej
indeed reflect what was termed the tritone paradox: That the direction
of perceived pitch height is dependent on pitch class.
While the phenomenon that is decisive for this explanation-namely,
preference of a particular absolute pitch region-is induded iri thevirtualpitch theory, individual variations are not, of course. It is not difficult to
see that a small deviation from the average, of the position and/or shape
of the "preference characteristics" of a particular subject, can consiclerably
affect the "ascending/descending" judgments. As "preference characteristics" are justa parameter of auditory cognitive strategy, there may-indeed
exist systematic individual differences. For example, with respect to the
aforementioned theoretical relationships between virtual-pitch evaluation
and speech perception, one may speculate that acoustic parameters of both
externa! voices and one's own voice, by exposition in early life may have
235
84
72
60
\"
36
I
.
. .
+s
2+
12
F"
c"G
G"
D"A
A"
Fig. .5. Theoretical pitch patrerns of Shepard tones. Pitch class is indicared on the abscissa.
Pirches pertinent to each tone are verrically lined up. Ordinate: Pirch-equivalenr frequency,
expressed in semirones above C (16.35 Hz). Area of squares is proportional to calculared
pitch weight, that is, represents prominence. Note thar for ali Shepard tones, the rnosr
prominenr pitches are in the height region of abour 36 to 60 .sernitones, corresponding
to 131-525 Hz. On rhe abscissa the Shepard tones are arranged in pairs of trirone intervais.
The existence of the "preference region" causes systernaric, pirch-class-deperidenr ascending or descending of the two most prominent pitches wirhin each pair (arrows). This
explains the tritone paradox. Cornputarion as described by Terhardt et al. (1982a).
0
236
. Ernsr .Terhardt
Concluding
,;'~>'.
Discussion
After the foregoing considerations of the inforrnation processing hierarchy and sorne of its low-level implications for the perception of music,
ir would be just consequent to proceed wirh discussing rhird- and higherlevel processes. Ir is essentially on the third leve! that temporally coded
information comes into play, that is, rhe musical inforrnation included in
melodic contour, root progression, and rhythm. As on those topics much
experimental and theoretical research as been done already (e.g., Deutsch,
1982a), ir is an inreresting challenge to incorporare them into the present
general approach. Of course, rhis by far would exceed the scope of the
present study.
There exist a number of experimental investigations pertinent to the
borderline between low-level, staric, and higher-level dynamic cognitive
processes, providing a kind of interface. Here appear of particular rele-
References
varice:
The "continuiry effect" and "pulsarion threshold" (e.g., Thurlow, 1957; Warren, Obusek, & Ackroff, 1972; Houtgasr,
1974);
Relationships between perceived rhythm and spectral- and
virrual-pitch patterns such as dernonstrared by van Noorden
(1975);
Virtual pitch evoked by nonsimultaneous harmonics (Hall &
Perers, 1981).
Finally, sorne biological aspects of the present approach deserve to be
rnenrioned. Although the present approach does not give an imrnediate
answer to the question on a possible "survival value of music" (Roederer,
1984), ir provides a pertinent message. Thar message essentiatly is thar
tbere exisr a number of fundamental principies of sensory information
acquisition that are decisive for survival:
Immediare conditioned reaction to an externa] challenge.
As a provision of appropriate reaction: Immediate, auronomous processing of the information included in any sensory
stimulus (i.e., abstraction).
- As the basic element of abstraction: Discretization (conrourizarion) by conditioned decision.
As a tool and provision for efficienr absrraction: Evolurion,
acquisition, and utilization of distributed knowledge.
As ouclined in rhe present srudy, it is rhose survival-relevant principies that
govern auditory perception of tonal music as well.
217
~-
Bernsrein, L. The unanstoered question, Cambridge, MA: Harvard Universiry Press, 1976.
Bregman, A. S., & Campbell, ]. Primary audirory strearn segregarion and rhe perception
of order in rapid sequences of rones. [ournal o( Experimental Psychology, 1971, 89,
244-249.
Burns, E. Circulariry in relarive pirch judgments for inharmonic rones: The Shepard demonsrration revisited, again. Perception & Psychophvsics. 1981, 30, 467-472.
Carrerette, E. C., Friedman, M. P., & Lovell,]. D. Mach bands in hearing. [ournal of tbe
Acoustical Society of America, 1969, 45, 986-998.
Carrererre, E. C., Kohl, D. V., & Pirt, M. A. Similariries arnong rransforrned melodies: The
absrraction of invarianrs, Music Perception, 1986, 3, 393-410.
Clarkson, M. G., & Clifron, R. K. Infanr pirch perceprion: Evidence for responding ro pitch
caregories and rhe missing fundamental. [ournal of the Acousticai Society of America,
1985, 77, 1521-1528.'
. '
.
Demany; L., & rmand, F. The perceprual realiry of tone chrorna in early infancy, Journal
'" of the Acoustical Society. of America, 1984, 76; 57-66.
Deu~~h,_;D,Music reccgnirion. Psycbological Reuiew, 1969, !li,_300-307. .
..
,
Deutsch, D. (Ed.) The .psychology of music. New York: A.:adem1c,Press, 1982a ....
Deursch, D. Grouping mechanisms in rnusic.In D:Dc:urs<:h(Ed.), Tbe psjicholoi;i of!ndsic.
New York: Academic Press, 1982b.
,;~'-' - ;
Deutsch, D. A musical paradox. Music Perception, 198,, 3, 275-28Q.
,..
Deutsch, D. The semitone paradox. Musk: Perception, i9SS, 6,~115,,-132. ,
. , ,
Deutsch, D., Kuyper, W. L., & Fisher, Y. The trirone paraqt\.x_:-frs'
prsence and form'ii(
disrriburiori 'iri a gerieral popularion. Music Perception, 1987, 5, 79~92:
Deursch, D., Moore, F. R., & Dolson, M. Pirch classes differ wirh respecr ro heighr. lvfusic
, Perception, 1984, 2, 265-271.
.
.... ,
Dowlirig, W. J: Scale and conrour: Two cornponenrs of J rheory of rnernory for melodies.
Psycbological Reuieur, 1978, 85, 341-354.
Fasrl, Hz Auditory after-images produced by cornplex rones with a specrral gap. In Pro' .ceedings of the 12th lnternational Congress on Acoustics. Bl:--5. Toronro: Beauregard
Press, 1986:
'
Hall, J. W., & Prers, R. W. Pitch for nonsimultaneous successiveharmonics in quier and
noise. [ournal of tbe Acouetical Socity 'o] Amrica. 19SL 69, 509-513.
Hall.]. W., & Perers, R-.W. Changein rhe pirch of a cornplex rene following irs associarion
_wirh i, second cornplex rone. [ournal of the Acoustical Society of America, 1982, 71,
'142-146;
.
Hall, J:'W., & Soderquisr, D. R. Transient cornplex and pure rone pirch changes by adapration, [ournal of the Acoustical Society of Americe. 1982, 71, 665-670.
l. This srudy '".S worked out in rhe Sonderforschungsbereich 204 "Cehor", Mnchen,
supported by rhe Deutsche Forschungsgemeinschafr.
2. Porrions of rhis paper were presented ar the Firsr Inrernarional Conference on Music
Perceprion and Cognition, Kyoro, Japan, Ocrober 1989.