Embryo of A Quantum Vocal Theory of Soun PDF

Proc e e d i n g s o f t h e 2 2 n d C I M , U d i n e , N o v e m b e r 2 0 -2 3 , 2 0 1 8
Embryo of a Quantum Vocal Theory of Sound
Davide Rocchesso Maria Mannone

University of Palermo University of Minnesota, alumna
davide.rocchesso@unipa.it manno012@umn.edu
ABSTRACT representations [4, 5]. Should we represent sounds as they

appear to the senses, by manipulating their proximal char-
Concepts and formalism from acoustics are often used acteristics? Or should we rather look at potential sources,
to exemplify quantum mechanics. Conversely, quantum at physical systems that produce sound as a side effect
mechanics could be used to achieve a new perspective on of distal interactions? In this research path we assume
acoustics, as shown by Gabor studies. Here, we focus that our body can help establishing bridges between dis-
in particular on the study of human voice, considered as tal (source-related) and proximal (sensory-related) repre-
a probe to investigate the world of sounds. We present sentations, and we look at research findings in perception,
a theoretical framework that is based on observables of production, and articulation of sounds [6]. Our embodied
vocal production, and on some measurement apparati approach to sound [7] seeks to exploit knowledge in these
that can be used both for analysis and synthesis. In areas, especially referring to human voice production as a
analogy to the description of spin states of a particle, form of embodied representation of sound.
the quantum-mechanical formalism is used to describe
the relations between the fundamental states associated When considering what people hear from the environ-
with phonetic labels such as phonation, turbulence, and ment, it emerges that sounds are mostly perceived as be-
slow myoelastic vibrations. The intermingling of these longing to categories of the physical world [8]. Research
states, and their temporal evolution, can still be interpreted in sound perception has shown that listeners spontaneously
in the Fourier/Gabor plane, and effective extractors can create categories such as solid, electrical, gas, and liquid
be implemented. This would constitute the basis for a sounds, even though the sounds within these categories
Quantum Vocal Theory of sound, with implications in may be acoustically different [9]. However, when the task
sound analysis and design. is to separate, distinguish, count, or compose sounds, the
attention shifts from sounding objects to auditory objects
[10] represented in the time-frequency plane. Tonal com-
1. INTRODUCTION
ponents, noise, and transients can be extracted from sound
What are the fundamental elements of sound? What is the objects with Fourier-based techniques [11, 12, 13]. Low-
best framework for analyzing existing sonic realities and frequency periodic phenomena are also perceptually very
for expressing new sound concepts? These are long stand- relevant and often come as trains of transients. The most
ing questions in sound physics, perception, and creation. prominent elements of the proximal signal may be selected
In 1947, in a famous paper published in Nature [1], Den- by simplification and inversion of time-frequency repre-
nis Gabor embraced the mathematics of quantum theory sentations. These auditory sketches [14], have been used
to shed light on subjective acoustics, thus laying the ba- to test the recognizability of imitations [15].
sis for sound analysis and synthesis based on acoustical
quanta, or grains, or wavelets. The Fourier/Gabor frame- Vocal imitations can be more effective than verbaliza-
work for time-frequency representation of sound is widely tions at representing and communicating sounds when
used, although human acuity has been shown to beat the these are difficult to describe with words [16]. This
uncertainty limit [2] and cochlear filters [3] have been pro- indicates that vocal imitations can be a useful tool for
posed to match human performance. Still, when we are investigating sound perception, and shows that the voice
imagining sound, or describing it to peers, we do not use is instrumental to embodied sound cognition. At a more
the Fourier formalism. fundamental level, research on non-speech vocalization
is affecting the theories of language evolution [17], as it
1.1 Voice as Embodied Sound seems plausible that humans could have used iconic vocal-
izations to communicate with a large semantic spectrum,
Many researchers, in science, art, and philosophy, have prior to the establishment of full-blown spoken languages.
been facing the problem of how to approach sound and its Experiments and sound design exercises [7] show that
agreement in production corresponds to agreement in
Copyright: c 2018 Davide Rocchesso et al. This is meaning interpretation, thus showing the effectiveness
an open-access article distributed under the terms of the of teamwork in embodied sound creation. Converging
Creative Commons Attribution License 3.0 Unported, which permits unre- evidences from behavioral and brain imaging studies give
stricted use, distribution, and reproduction in any medium, provided the original a firm basis to hypothesize a shared representation of
author and source are credited. sound in terms of motor (vocal) primitives [18].
CI M 2 0 1 8 P R O C E E D I N G S 10 1
In the recent EU-FET project SkAT-VG 1 , phoneticians modules that are available both for analysis and, as control
have had an important role in identifying the most relevant knobs, for synthesis. The baseline is found in the results
components of non-speech voice productions [19]. They of the SkAT-VG project, which showed that vocal imita-
identified the broad categories of phonation (i.e., quasi tions are optimized representations of referent sounds, that
periodic oscillations due to vocal fold vibrations), turbu- emphasize those features that are important for identifica-
lence, slow myoelastic vibrations, and clicks, which can be tion. A large collection of audiovisual recordings of vocal
extracted automatically from audio with time-frequency and gestural imitations 2 offers the opportunity to further
analysis and machine learning [20], and can be made to enquire how people perceive, represent, and communicate
correspond to categories of sounds as they are perceived about sounds.
[15], and as they are produced in the physical world. A first assumption underlying this research approach,
Indeed, it has been argued that human utterances somehow largely justified by prior art and experiences, is that artic-
mimic “nature’s phonemes” [21]. ulatory primitives used to describe vocal utterances are ef-
fective as high-level descriptors of sound in general. This
1.2 Quantum Frameworks assumption leads naturally to an embodied approach to
sound representation, analysis, and synthesis. A second
It was Dennis Gabor [1] who first adopted the mathematics assumption is that the mathematics of quantum mechan-
of quantum mechanics to explain acoustic phenomena. In ics, relying on linear operators in Hilbert spaces, offers a
particular, he used operator methods to derive the time- formalism that is suitable to describe the objects compos-
frequency uncertainty relation and the (Gabor) function ing auditory scenes and their evolution in time. The latter
that satisfies minimal uncertainty. Time-scale representa- assumption is more adventurous, as this path has not been
tions [22] are more suitable to explain the perceptual de- taken in audio signal processing yet. However, the results
coupling of pitch and timbre, and operator methods can coming from neighboring fields (music cognition, image
be used as well to derive the gammachirp function, which processing) encourage us to explore this direction, and to
minimizes uncertainty in the time-scale domain [23]. Re- aim at improved techniques for sound analysis and synthe-
search in human and machine hearing [3] have been based sis.
on banks of elementary (filter) functions and these systems
are at the core of many successful applications in the audio
domain. 2. SKETCH OF A QUANTUM VOCAL THEORY
Despite its deep roots in the physics of the twentieth An embryonic theory of sound based on the postulates of
century, the sound field has not yet embraced the quan- quantum mechanics, and using high-level vocal descriptors
tum signal processing framework [24] to seek practical of sound, can be sketched as follows. Let be a vector
solutions to sound scene representation, separation and operator that provides information about the phonetic ele-
analysis. A quantum approach to music cognition has ments along a specific direction of measurement. Phona-
not been proposed until very recently [25], when its ex- tion, for example, may be represented by z , with eigen-
planatory power has been demonstrated to describe tonal states representing a upper and a lower pitch. Similarly,
attraction phenomena in terms of metaphorical forces. The the turbulence component may be represented by x , with
theory of open quantum systems has been applied to music eigenstates representing turbulence of two different distri-
to describe the memory properties (non-Markovianity) of butions. A measurement of turbulence prepares the system
different scores [26]. In the image domain, on the other in one of two eigenstates for operator x , and a successive
hand, it has been shown how the quantum framework can measurement of phonation would find a superposition and
be effective to solve problems such as segmentation. For get equal probabilities for the two eigenstates of z . The
example, the separation of figures from background can two operators z and x may also be made to correspond to
be obtained by evolving a solution of the time-dependent the two components of the classic sines+noise model used
Schrödinger equation [27], or by discretizing the time- in audio signal processing. If we add transients/clicks as a
independent Schrödinger equation [28]. An approach to third measurement direction (as in the sines + noise + tran-
signal manipulation based on the postulates of quantum sients model [12]) we can claim that there is no sound state
mechanics can also potentially lead to a computational for which the expectation value of the three components
advantage when using Quantum Processing Units. Results is zero: a sort of spin polarization principle as found in
in this direction are being reported for optimization quantum mechanics. The evolution of state vectors in time
problems [29]. is unitary, and regulated by a time-dependent Schrödinger
equation, with a suitably chosen Hamiltonian. The eigen-
1.3 Research Direction vectors of the Hamiltonian allow to expand any state vector
in that basis, and to compute the time evolution of such ex-
In the proposed research path, sound is treated as a super-
pansion. A pair of components can be simultaneously mea-
position of states, and the voice-based components (phona-
sured only if they commute. If they don’t, an uncertainty
tion, turbulence, slow myoelastic pulsations) are consid-
principle can be derived, as it was done for time-frequency
ered as observables to be represented as operators. The
and time-scale representations [1, 23]. The theory can be
extractors of the fundamental components, i.e. the mea-
extended to cover multiple sources, and the resulting mixed
surement apparati, are implemented as signal-processing
2 https://www.ircam.fr/projects/blog/multimodal-database-of-vocal-
1 www.skatvg.eu and-gestural-imitations-elicited-by-sounds/
CI M 2 0 1 8 P R O C E E D I N G S 10 2
states can be described via density matrices, whose time If the phon is prepared |ri (turbulent) and then the mea-
evolution can also be computed if a Hamiltonian operator surement apparatus is set to measure z , there will be equal
is properly defined. The effect of disturbances and devia- probabilities for |ui or |di phonation as an outcome. Es-
tions can be accounted for by introducing relaxation time, sentially, we are measuring what kind of phonation is in a
to model return to equilibrium. This sketch of a Quan- pure turbulent state. This measurement property is satis-
tum Vocal Theory needs to be developed and formally laid fied if
down. 1 1
|ri = p |ui + p |di . (2)
2 2
3. THE PHON FORMALISM In fact, any state |Ai can be expressed as |Ai =
↵u |ui + ↵d |di, where ↵u = hu|Ai, and ↵d = hd|Ai. The
Consider a 3d space with the orthogonal axes probability to measure pitch up is Pu = hA|ui hu|Ai =
↵u ⇤ ↵u , and the probability to measure pitch down is
z : phonation, with different pitches;
Pd = hA|di hd|Ai = ↵d ⇤ ↵d (Born rule).
x : turbulence, with noises of different frequency distribu- Likewise, if the phon is prepared |li and then the mea-
tions; surement apparatus is set to measure z , there will be equal
probabilities for |ui or |di phonation as an outcome. This
y : myoelastic, slow pulsations with different tempos. measurement property is satisfied if
The phon operator is a 3-vector operator that provides 1 1
|li = p |ui p |di , (3)
information about the phonetic elements in a specific di- 2 2
rection of the 3d phonetic space. which is orthogonal to"the linear combination (2).
In this section we present the phon formalism, obtained # " # In vector
p1 p1
by direct analogy with the single spin, as presented in ac- form, we have: |ri = 2 and |li = 2 ,
p1 p1
cessible presentations of quantum mechanics [30]. We use 2 2
standard Dirac notation. and 
0 1
x = . (4)
3.1 Measurement along z 1 0
A measurement along the z axis is performed according to 3.3 Preparation along y

the quantum-mechanics principles:
The eigenstates of the operator y are |f i and |si, corre-
1. Each component of is represented by a linear op- sponding to slow myoelastic pulsations, one faster and one
erator; slower 3 , with eigenvalues +1 and 1, so that
2. The eigenvectors of z are |ui and |di, correspond- (a) y |f i = |f i

ing to pitch up and pitch down, with eigenvalues +1 (b) y |si = |si
and 1, respectively:
If the phon is prepared |f i (pulsating) and then the mea-
(a) z |ui = |ui surement apparatus is set to measure z , there will be equal
probabilities for |ui or |di phonation as an outcome. Es-
(b) z |di = |di
sentially, we are measuring what kind of phonation is in a
3. The eigenstates of operator z, |ui and |di, are or- slow myoelastic pulsations. This measurement property is
thogonal: hu|di = 0; satisfied if
1 i
|f i = p |ui + p |di , (5)
The eigenstates
 can be represented as column vectors: 2 2
1 0 where i is the imaginary unit.
|ui = and |vi = ,
0 1 Likewise, if the phon is prepared |si, we can express
and the operator z as a square 2 ⇥ 2 matrix. Due to this state as
principle 2, we have
1 i
 |si = p |ui p |di , (6)
1 0 2 2
z = . (1)
0 1 which is orthogonal to "the linear combination (5).
# " # In vector
p1 p1
3.2 Preparation along x form, we have: |f i = pi
2 and |si = 2
pi
,
2 2
The eigenstates of the operator x are |ri and |li, corre- and 
sponding to turbulences having different spectral distribu- 0 i
y = . (7)
tions, one with the rightmost centroid and the other with i 0
the leftmost centroid. The respective eigenvalues are +1 The matrices (1), (4), and (7) are called the Pauli matri-
and 1, so that ces and, together with the identitity matrix, these are the
(a) |ri = |ri quaternions.
x
3 In describing the spin eigenstates, the symbols |ii and |oi are often
(b) x |li = |li . used, to denote the in–out direction.
CI M 2 0 1 8 P R O C E E D I N G S 103
3.4 Measurement along an arbitrary direction is governed by the operator U, which is unitary, i.e.,
U† U = I. Taken a small time increment ✏, continuity of
Orienting the measurement apparatus along an arbitrary di-
0 the time-development operator gives it the form
rection n = [nx , ny , nz ] means taking a weighted mix-
ture: U(✏) = I i✏H, (14)
n = ·n= x nx + y ny + z nz = with H being the quantum Hamiltonian (Hermitian) oper-

 ator. H is an observable and its eigenvalues are the values
nz nx iny
= . (8) that would result from measuring the energy of a quantum
nx + iny nz
system. From (14) it turns out that a state vector changes
3.4.1 Example: Harmonic plus Noise model in time according to the time-dependent Schrödinger equa-
tion 4
A measurement performed by means of a Harmonic plus @| i
= iH | i . (15)
Noise model [11] would lie in the phonation-turbulence @t
plane (nz = cos ✓, nx = sin ✓, ny = 0), so that Any observable L has an expectation value hLi that
 evolves according to
cos ✓ sin ✓
n = (9)
sin ✓ cos ✓ @ hLi
= i h[L, H]i , (16)
 @t
cos ✓/2
The eigenstate for eigenvalue +1 is | = , the 1i
sin ✓/2 where [L, H] = LH HL is the commutator of L with

sin ✓/2 H.
eigenstate for eigenvalue 1 is | 1 i = , and
cos ✓/2
the two are orthogonal. Suppose we prepare the phon to 3.5.1 Phon in utterance field
pitch-up |ui. If we rotate the measurement system along Similarly to a spin in a magnetic field, when a phon is part
n, the probability to measure n = +1 is (by Born rule) of an utterance, it has an energy that depends on its orien-
2
tation. We can think about it as if it was subject to restoring
P (+1) = |hu| 1 i| = cos2 ✓/2, (10) forces, and its quantum Hamiltonian is
and the probability to measure n = 1 is H/ ·B = x Bx + y By + z Bz , (17)
2
P ( 1) = |hu| 1 i| = sin2 ✓/2. (11) where the components of the field B are named in analogy
with the magnetic field.
The expectation value of measurement is therefore Consider the case of potential energy only along z:
X !
h ni = iP ( i) H= z. (18)
i
2
= (+1) cos2 ✓/2 + ( 1) sin2 ✓/2 = cos ✓ (12) To find how the expectation value of the phon varies in
time, we expand the observable L in (16) in its components
3.4.2 Rotate to measure to get
What does it mean to rotate a measurement apparatus to h ˙ xi = i h[ !h (19)
x , H]i = yi
measure a property? Assume we have a machine that sep-
arates harmonics from noise from (trains of) transients, h ˙ yi = i h[ y , H]i = ! h xi
and that can discriminate between two different pitches, h ˙ zi = i h[ z , H]i = 0,

noise distributions, and tempos. Essentially, the machine
receives a sound and returns three numbers {ph, tu, my} 2 which means that the expectation values of x and y are
[ 1, 1]. If ph > 0 the result will be |ui, and if ph < 0 the subject to temporal precession around z at angular velocity
result will be |di. If tu > 0 the result will be |ri, and if !. The expectation value of z steadily keeps the pitch
tu < 0 the result will be |li. If my > 0 the result will if there is no potential energy along turbulence and slow
be |f i, and if my < 0 the result will be |si. These three myoelastic vibration.
outputs correspond to rotating the measurement apparatus A potential energy along all three axes can be expressed
along each of the main axes. Rotating it along an arbitrary as
direction means taking a weighted mixture of the three out- 
! ! nz nx iny
comes. H= ·n= , (20)
2 2 nx + iny nz
3.5 Time evolution whose energy eigenvalues are Ej = ±1, with energy
eigenvectors |Ej i.
In quantum mechanics the evolution of state vectors in time
4 We do not need physical dimensional consistency here, so we drop
| (t)i = U(t) | (0)i (13) Planck’s constant.
CI M 2 0 1 8 P R O C E E D I N G S 104
An initial state vector (phon) | (0)i can be expanded as

X
| (0)i = ↵j (0) |Ej i , (21)
j
where ↵j (0) = hEj | (0)i, and the time evolution of state

turns out to be
X X
| (t)i = ↵j (t) |Ej i = ↵j (0)e iEj t |Ej i . (22)
j j
3.6 Uncertainty
Figure 1. A non-commutative diagram (in the style of cat-
If we measure two observables L and M (in a single ex- egory theory), representing the non-commutativity of mea-
periment) simultaneously, quantum mechanics prescribes surements of phonation (P ) and turbulence (T ) on audio
that the system is left in a simultaneous eigenvector of the A.
observables only if L and M commute, i.e. [L, M ] = 0.
Measurement operators along different axes do not com-
mute. For example, [ x , y ] = 2i z , and therefore phona-
tion and turbulence can not be simultaneously measured
with certainty.
The uncertainty principle, based on Cauchy-Schwarz
inequality in complex vector spaces, prescribes that the
product of the two uncertainties is at least as large as half
the magnitude of the commutator:
Figure 2. On the left, a voice is analyzed via the HPS
1 model. Then, the stochastic part is submitted to a new
L M |h | [L, M ] | i| (23) analysis. In this way, a measurement of phonation follows
2
a measurement of turbulence. On the right, the measure-
Equation (23) expresses the uncertainty principle. ment of turbulence follows a measurement of phonation.
If L = t is the time operator and S = i dt d
is the fre-
quency operator, and these are applied to the complex os-
cillator Aei!t , the time-frequency uncertainty principle re- through the extraction of the harmonic component in the
sults, and uncertainty is minimized by the Gabor function. HPS model, while the measurement of turbulence is per-
Starting from the scale operator, the gammachirp function formed through the extraction of the stochastic component
can be derived [23]. with the same model. The spectrograms for A00 and A⇤⇤ in
Figure 3 show the results of such two sequences of anal-
4. WHAT TO DO WITH ALL THIS yses on a segment of female speech 5 , confirming that the
commutator [T , P ] is non-zero.
Assuming that a vocal description of sound follows the Essentially, if we adopt the HPS model and skip the
postulates of quantum mechanics, the question is how to final step of addition and inverse transformation, we are
take advantage of the quantum formalism. How can we left with something that is conceptually equivalent to a
practically connect theoretical ideas with audio signal pro- quantum destructive measure. Let St be the filter that
cessing? extracts the stochastic part from a signal. As figure 4
shows, the spectrogram of St(x) is visibly different from
4.1 Non-commutativity and autostates
the spectrogram of x. Conversely, if we apply St once
We expect that measurement operators along different axes more, we get a spectrum that does not change much:
do not commute: this is the case, for example, of mea- St2 (x) = St(St(x)) ⇠ St(x). If we transform back
surements of phonation and turbulence. Let A be an au- from the second and third spectrograms of figure 4, we
dio sample. The measurement of turbulence by the oper- get sounds that are very close to each other. In fact,
ator T leads to T (A) = A0 . A successive measurement ideally, St2 (x) = St(x). It means that, after a measure
of phonation by the operator P gives P (A0 ) = A00 , thus of the non-harmonic component of some signal, the
P (A0 ) = P T (A) = A00 . If we perform the measurements output-signal can be considered as an autostate. If we
in the opposite order, with phonation first and turbulence perform the measure again and again, we still get the same
later, we obtain T P (A) = T (A⇤ ) = A⇤⇤ . We expect that result. Such a measure operation provokes the collapse
[T, P ] 6= 0, and thus, that A⇤ ⇤ 6= A00 . The diagram in of a hypothetical underlying wave function, which is
figure 1 shows non-commutativity in the style of category originally a superposition of states, and is reduced to a
theory. single state upon measurement. The importance of the
Measurements of phonation and turbulence can be actu- autostates in this framework is connected with the concept
ally performed using the Harmonic plus Stochastic (HPS)
5 https://freesound.org/s/317745/.
model [11]. The order of operations is visually described Hann window of 2048 samples, FFT of 4096 samples, hop size of 1024
in figure 2. The measurement of phonation is performed samples.
CI M 2 0 1 8 P R O C E E D I N G S 105
Figure 3. On the top, the spectrogram corresponding to a Figure 4. Top: spectrum of the original sound signal (a
measurement of phonation P following a measurement of female speech), Center: the stochastic component, derived
turbulence T , leading to P T (A) = A00 . On the bottom, from harmonic plus stochastic analysis (HPS), as the ef-
the spectrogram corresponding to a measurement of turbu- fect of a destructive measure, and Bottom: the stochastic
lence T following a measurement of phonation P , leading component of the stochastic component itself. The last two
to T P (A) = A⇤⇤ . spectra are very close.
of quantum measures, which may become practically tential energies due to restoring forces that are contained
feasible through a set of audio-signal analysis tools. in the utterance at a particular moment in time. We could
We can define other filters that extract more specific in- even think of a formal composition of forces acting at dif-
formation: for example, a glissando filter would extract ferent levels, and describe them through a nested categori-
glissando passages within a sound signal, giving as out- cal depiction.
put the inverse transform of the filtered spectrum, that is, a Similarly to what has been done by Youssry et
filtered signal where we can just hear the glissando effect al. [27], the Hamiltonian can be chosen to be time-
and nothing else. Re-performing the glissando-measure on dependent yet commutative (i.e., [H(0), H(t)] =
that sound, we still get the same sound: the new sound H(0)H(t) H(t)H(0) = 0), so that a closed-form
signal is an autostate of glissando. A suitable col- solution to state evolution can be obtained. A time-
lection of filters that act both on the original sounds as independent Hamiltonian such as the one leading to (22)
well as on their vocal imitations can produce simplified would not be very useful, both because forces indeed
spectra, easier to compare. These may give hints on how change continuously and because this would lead to oscil-
information are extracted by voice, that permit to recog- latory solution. Instead, with a time-varying commutative
nize the original sound source. This would confirm the Hamiltonian the time evolution can be expressed as
importance of human voice as a probe to investigate the i
Rt
H(⌧ )d⌧
world of sounds, and Quantum Vocal Theory as a bridge | (t)i = e 0 | (0)i . (24)
between quantum physics, acoustics, and cognition, with A simple choice is that of a Hamiltonian such as
possible further bridges to multisensory perception and in-
teraction. If we consider that voice can imitate not only H(t) = g(t)S, (25)
sounds, but also movements and, sometimes, even visual
shapes through crossmodal correspondences [31], new fas- with S a time-independent Hermitian matrix. A function
cinating scenarios open up for investigation. g(t) that ensures convergence of the integral in (24) is the
damping
4.2 Time evolution g(t) = e t . (26)
We know that time evolution of states is governed by the In an audio application, we can consider a slice of time and
unitary transformation (13) and by the Schrödinger equa- the initial and final states for that slice. We should look
tion (15). A measurement is represented by an operator for a Hamiltonian that leads to the evolution of the initial
that acts on the state and that causes its collapse onto one state into the final state. In image segmentation [27], where
of its eigenvectors. The system remains in a superposition time is used to let each pixel evolve to a final foreground-
of states until a measurement occurs, whose outcome is in- background assignment, the Hamiltonian is chosen to be
herently probabilistic. The states change under the effect 
t 0 i
of external forces, which therefore determine the change in H = e f (x) , (27)
i 0
the probabilities associated with each state. It is a Hamil-
tonian such as (20) that controls the evolution process. We and f (·) is a two-valued function of a feature vector x that
can think of the vocal Hamiltonian as containing the po- contains information about a neighborhood of the pixel.
CI M 2 0 1 8 P R O C E E D I N G S 10 6
understanding. Finally, given the importance of vocal im-

itations in the framework of improvised music and sound
art, they can constitute a probe also for studying and cre-
ating instrumental music, and for conceiving new forms
of sonic expression, also through manual and body ges-
tures, which enable connections to the visual and perform-
ing arts.
6. REFERENCES
[1] D. Gabor, “Acoustical quanta and the theory of hear-
ing,” Nature, vol. 159, no. 4044, p. 591, 1947.
[2] J. N. Oppenheim and M. O. Magnasco, “Human time-

frequency acuity beats the fourier uncertainty princi-
ple,” Phys. Rev. Lett., vol. 110, p. 044301, Jan 2013.
Figure 5. Extraction of the two most salient pitches from
a mixture of a male voice and a female voice [3] R. F. Lyon, Human and machine hearing. Cambridge
University Press, 2002.
Such function is learned from an example image with a [4] G. De Poli, A. Piccialli, and C. Roads, Representations
given ground truth. In audio we may do something simi- of musical signals. MIT Press, 1991.
lar and learn from examples of transformations: phonation
[5] D. Roden, “Sonic art and the nature of sonic events,”
to phonation, with or without pitch crossing; phonation to
Review of Philosophy and Psichology, vol. 1, no. 1,
turbulence; phonation to myoelastic, etc. We may also add
pp. 141–156, 2010.
a coefficient to the exponent in (26), to govern the rapidity
of transformation. As opposed to image processing, time [6] M. Leman, Embodied music cognition and mediation
is our playground. technology. MIT Press, 2008.
The matrix S can be set to assume the structure (20),
and the components of potential energy found in an utter- [7] S. D. Monache, D. Rocchesso, F. Bevilacqua,
ance field can be extracted as audio features. For example, G. Lemaitre, S. Baldan, and A. Cera, “Embodied sound
pitch salience can be extracted from time-frequency anal- design,” International Journal of Human-Computer
ysis [32] and used as nz component for the Hamiltonian. Studies, vol. 118, pp. 47 – 59, 2018.
Figure 5 shows the two most salient pitches, automatically
[8] W. W. Gaver, “How do we hear in the world? Explo-
extracted from a mixture of male and female voice 6 . Fre-
rations in ecological acoustics,” Ecological Psychol-
quent up-down jumps are evident, and they make difficult
ogy, vol. 5, no. 4, pp. 285–313, 1993.
to track a single voice. Quantum measurement induces
state collapse to |ui or |di and, from that state, evolution [9] O. Houix, G. Lemaitre, N. Misdariis, P. Susini, and
can be governed by (24). In this way, it should be possible I. Urdapilleta, “A lexical analysis of environmental
to mimic human figure-ground attention [33], and follow sound categories,” Journal of Experimental Psychol-
each individual voice. ogy: Applied, vol. 18, no. 1, p. 52, 2012.
[10] M. Kubovy and M. Schutz, “Audio-visual objects,”

5. CONCLUSION AND FURTHER RESEARCH Review of Philosophy and Psichology, vol. 1, no. 1,
We presented an early attempt at applying the fundamental pp. 41–61, 2010.
quantum formalism to the description of human voice, and [11] J. Bonada, X. Serra, X. Amatriain, and A. Loscos,
to sounds in general through voice-based basic elements. “Spectral processing,” in DAFX: Digital Audio Effects
Such a theoretical research can have several practical im- (U. Zölzer, ed.), pp. 393–445, John Wiley & Sons, Ltd,
plications, due to the importance of voice as a probe to 2011.
investigate the world of sounds in general. Also, a quan-
tum vocal theory enhances the role of quantum mechan- [12] T. S. Verma, S. N. Levine, and T. H. Meng, “Transient
ics and of the underlying mathematics as a connecting tool Modeling Synthesis: a flexible analysis/synthesis tool
between different areas of human knowledge. By flipping for transient signals,” in Proceedings of the Interna-
the wicked problem of finding intuitive interpretations of tional Computer Music Conference, pp. 48–51, 1997.
quantum mechanics, the proposed approach aims to use
quantum mechanics to interpret something that we have [13] R. Fg, A. Niedermeier, J. Driedger, S. Disch, and
embodied, intuitive knowledge of. The new theory can M. Mller, “Harmonic-percussive-residual sound sep-
lead to the development of voice-based tools for sound aration using the structure tensor on spectrograms,”
analysis and synthesis, for research and communication in 2016 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 445–449,
6 https://freesound.org/s/431595/ March 2016.
CI M 2 0 1 8 P R O C E E D I N G S 107
[14] V. Isnard, M. Taffou, I. Viaud-Delmon, and C. Suied, [28] Ç. Aytekin, E. C. Ozan, S. Kiranyaz, and M. Gabbouj,
“Auditory sketches: very sparse representations of “Extended quantum cuts for unsupervised salient ob-
sounds are still recognizable,” PloS one, vol. 11, no. 3, ject extraction,” Multimedia Tools and Applications,
2016. vol. 76, no. 8, pp. 10443–10463, 2017.
[15] G. Lemaitre, A. Jabbari, N. Misdariis, O. Houix, and [29] F. Neukart, G. Compostella, C. Seidel, D. von Dollen,
P. Susini, “Vocal imitations of basic auditory features,” S. Yarkoni, and B. Parney, “Traffic flow optimization
The Journal of the Acoustical Society of America, using a quantum annealer,” Frontiers in ICT, vol. 4,
vol. 139, no. 1, pp. 290–300, 2016. p. 29, 2017.
[16] G. Lemaitre and D. Rocchesso, “On the effectiveness [30] L. Susskind and A. Friedman, Quantum Mechanics:
of vocal imitations and verbal descriptions of sounds,” The Theoretical Minimum. UK: Penguin Books, 2015.
The Journal of the Acoustical Society of America,
vol. 135, no. 2, pp. 862–873, 2014. [31] C. Spence, “Crossmodal correspondences: a tuto-
rial review,” Attention, Perception, and Psychophysics,
[17] M. Perlman and G. Lupyan, “People can create iconic vol. 73, no. 4, pp. 971–995, 2011.
vocalizations to communicate various meanings to
naı̈ve listeners,” Scientific reports, vol. 8, no. 1, [32] J. Salamon and E. Gomez, “Melody extraction from
p. 2634, 2018. polyphonic music signals using pitch contour charac-
teristics,” IEEE Transactions on Audio, Speech, and
[18] Z. Wallmark, M. Iacoboni, C. Deblieck, and R. A. Language Processing, vol. 20, pp. 1759–1770, Aug
Kendall, “Embodied Listening and Timbre: Percep- 2012.
tual, Acoustical, and Neural Correlates,” Music Per-
ception: An Interdisciplinary Journal, vol. 35, no. 3, [33] E. Bigand, S. McAdams, and S. Forêt, “Divided at-
pp. 332–363, 2018. tention in music,” International Journal of Psychology,
vol. 35, no. 6, pp. 270–278, 2000.
[19] P. Helgason, “Sound initiation and source types
in human imitations of sounds,” in Proceedings of
FONETIK 2014, pp. 83–88, 2014.
[20] E. Marchetto and G. Peeters, “Automatic recognition
of sound categories from their vocal imitation using au-
dio primitives automatically derived by SI-PLCA and
HMM,” in Proceedings of the International Symposium
on Computer Music Multidisciplinary Research, 2017.
[21] M. Changizi, Harnessed: How language and music
mimicked nature and transformed ape to man. Ben-
Bella Books, Inc., 2011.
[22] A. De Sena and D. Rocchesso, “A fast Mellin and scale
transform,” EURASIP Journal on Applied Signal Pro-
cessing, no. Article ID 8917, 2007.
[23] T. Irino and R. D. Patterson, “A time-domain, level-
dependent auditory filter: The gammachirp,” The Jour-
nal of the Acoustical Society of America, vol. 101,
no. 1, pp. 412–419, 1997.
[24] Y. C. Eldar and A. V. Oppenheim, “Quantum sig-
nal processing,” IEEE Signal Processing Magazine,
vol. 19, no. 6, pp. 12–32, 2002.
[25] P. beim Graben and R. Blutner, “Quantum approaches
to music cognition,” arXiv:1712.07417, 2018.
[26] M. Mannone and G. Compagno, “Characteriza-
tion of the degree of Musical non-Markovianity,”
arXiv:1306.0229, 2013.
[27] A. Youssry, A. El-Rafei, and S. Elramly, “A quantum
mechanics-based framework for image processing and
its application to image segmentation,” Quantum In-
formation Processing, vol. 14, no. 10, pp. 3613–3638,
2015.
CI M 2 0 1 8 P R O C E E D I N G S 108

Embryo of A Quantum Vocal Theory of Soun PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Embryo of A Quantum Vocal Theory of Soun PDF

Uploaded by

Copyright:

Available Formats

Proc e e d i n g s o f t h e 2 2 n d C I M , U d i n e , N o v e m b e r 2 0 -2 3 , 2 0 1 8

Embryo of a Quantum Vocal Theory of Sound

Davide Rocchesso Maria Mannone

ABSTRACT representations [4, 5]. Should we represent sounds as they

A measurement along the z axis is performed according to 3.3 Preparation along y

2. The eigenvectors of z are |ui and |di, correspond- (a) y |f i = |f i

n = ·n= x nx + y ny + z nz = with H being the quantum Hamiltonian (Hermitian) oper-

and that can discriminate between two different pitches, h ˙ zi = i h[ z , H]i = 0,

An initial state vector (phon) | (0)i can be expanded as

where ↵j (0) = hEj | (0)i, and the time evolution of state

understanding. Finally, given the importance of vocal im-

[2] J. N. Oppenheim and M. O. Magnasco, “Human time-

[10] M. Kubovy and M. Schutz, “Audio-visual objects,”

You might also like