You are on page 1of 362

M O D E R N PROBABILITY A N D STATISTICS

SELECTED TOPICS IN CHARACTERISTIC FUNCTIONS


ALSO AVAILABLE IN
MODERN PROBABILITY AND STATISTICS:

Chance and Stability: stable distributions and their applications


Vladimir V. Uchaikin and Vladimir M. Zolotarev

Normal Approximation: New Results, Methods and Problems


Vladimir V. Senatov

Modern Theory of Summation of Random Variables


Vladimir M. Zolotarev
MODERN
PROBABILITY AND STATISTICS

Selected Topics in
Characteristic Functions

Nikolai G. USHAKOV
Russian Academy of Sciences

MYSPIN
UTRECHT, THE NETHERLANDS, 1 9 9 9
VSP BV Tel: +31 30 692 5790
P.O. Box 346 Fax: +31 30 693 2081
3700 AH Zeist E-mail: vsppub@compuserve.com
The Netherlands Home Page: http://www.vsppub.com

V S P BV 1999

First published in 1999

ISBN 90-6764-307-6

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior
permission of the copyright owner.

Printed in The Netherlands by Ridderprint bv, Ridderkerk.


Contents

Preface vii

Notation ix

1 Basic properties 1
1.1 Definition and elementary properties 1
1.2 Continuity theorems and inversion formulas 4
1.3 Criteria 8
1.4 Inequalities 23
1.5 Characteristic functions and moments, expansions of character-
istic functions, asymptotic behavior 39
1.6 Unimodality 43
1.7 Analyticity of characteristic functions 52
1.8 Multivariate characteristic functions 54
1.9 Notes 64

2 Inequalities 67
2.1 Auxiliary results 67
2.2 Inequalities for characteristic functions of distributions with
bounded support 81
2.3 Moment inequalities 88
2.4 Estimates for the characteristic functions of unimodal distribu-
tions 95
2.5 Estimates for the characteristic functions of absolutely continu-
ous distributions 100
2.6 Estimates for the characteristic functions of discrete distribu-
tions Ill
2.7 Inequalities for multivariate characteristic functions 114
2.8 Inequalities involving integrals of characteristic functions . . . . 133
2.9 Inequalities involving differences of characteristic functions . . . 142
2.10 Estimates for the first positive zero of a characteristic function . 154
2.11 Notes 158


3 Empirical characteristic functions 159
3.1 Definition and basic properties 159
3.2 Asymptotic properties of empirical characteristic functions . . . 164
3.3 The first positive zero 182
3.4 Parameter estimation 187
3.5 Non-parametric density estimation I 198
3.6 Non-parametric density estimation II 210
3.7 Tests for independence 226
3.8 Tests for symmetry 232
3.9 Testing for normality 242
3.10 Goodness-of-fit tests based on empirical characteristic functions 251
3.11 Notes 256

A Examples 259

Some characteristic functions 281

C Unsolved problems 331

Bibliography 335

Subject Index 351

Author Index 353

vi
Preface

Characteristic functions play an outstanding role in the theory of probability


and mathematical statistics. Due to this reason, there are a number of ex-
cellent books concerning the mathematical theory of characteristic functions
and their applications. But, at the same time (and due to the same reason),
the abundance of theoretical results on characteristic functions and variety of
their applications is so great that one book (of reasonable volume) can hard-
ly hold all of them. Moreover, this field is intensively developing, and many
results which have recently been obtained are out of existing monographs.
Inequalities for characteristic functions and the theory of empirical character-
istic function are among those parts of the theory of characteristic functions
that are inadequately slightly reflected in monographs. This book is supposed
to fill this gap.
The monograph consists of three chapters. Chapter 1 contains basic gen-
eral results concerning characteristic functions. Usually, proofs are given only
for those (mainly recent) results which are not included in existing textbooks
and monographs. The main goal of this chapter is to be a handbook on char-
acteristic functions. At the same time, it contains some new results.
Various inequalities for characteristic functions are presented in Chap-
ter 2 (both in univariate and multivariate cases). Lower and upper bounds for
characteristic functions are given both in the neighborhood of the origin and
for large values of the argument. The estimates involve such characteristics
as moments, the maximal value or the total variation of the probability den-
sity function, etc. Some inequalities involving integrals of the characteristic
function are given in Section 2.8. In Section 2.9, recent results are presented
concerning the problem of estimating the closeness of distribution functions in
terms of the closeness of the corresponding characteristic functions. The last
section is devoted to the problem of estimating the position of the first zero of
a characteristic function.
In Chapter 3, we study empirical characteristic functions and their ap-
plications in statistics. We begin with the statistical properties of empirical
characteristic function as estimators of the underlying characteristic function.
The results of this part of the chapter are the theoretical foundation for the
statistical inference based on empirical characteristic functions which is pre-

vii
sented in the remainder of the chapter. Some results are presented on statisti-
cal estimation (both parametric sind nonparametric) and on testing statistical
hypotheses.
A collection of various examples, counterexamples and assertions, demon-
strating interesting (sometimes unexpected) properties of characteristic func-
tion, is presented in Appendix A. Constructing counterexamples permanently
accompanies the investigation of characteristic function and their applications
and sometimes leads to beautiful and intriguing results. Such results are of-
ten not only of theoretical interest but helpful in solving applied problems as
well. Many examples have already been included in monographs concerning
counterexamples in probability theory and statistics and in those concerning
characteristic functions. However, recently quite a number of new interesting
examples have appeared in various publications and the goal of the appendix
is to bring together and systematize all known counterexamples.
In Appendix B, we give formulas and graphs of some frequently used char-
acteristic functions as well as characteristic functions demonstrating interest-
ing properties.
In Appendix C, several unsolved problems are presented.
I am very grateful to Yu.V. Prokhorov, D.V. Belomestnii, A.V. Kolchin,
V.Yu. Korolev, and A.P. Ushakova for many useful comments.

N. G. Ushakov

Moscow, May 1999.

viii
Notation

Throughout the book, a triple numeration is used: the first digit denotes the
number of a chapter, the second one denotes the number of the section in the
chapter and the last one denotes the number of an item (theorem, lemma,
formula) in the section.
For a complex number , denotes the complex conjugate. R m is the m-
dimensional Euclidean space. 91 and 3 denote the real and imaginary parts of
a function, respectively; f(x+0) and f(x 0) are respectively the right-hand and
a.s. D
left-hand limits; >, and denote, respectively, the convergence almost
surely, convergence in probability and convergence in distribution; a=' means
that an equality holds with probability one. The weak convergence is denoted
LimFGc) = F(x)
71- OO

or
Fn-+F.

If A is an event, then I a is the indicator of A; the symbol * denotes convolution.


Where the multivariate case is considered, m always denotes the dimension of
the space. The sign means the end of a proof.

ix
1

Basic properties of the


characteristic functions

In this chapter, we summarize basic results concerning characteristic func-


tions. Most of these results are given without proof because the proofs can
be found in many textbooks and monographs: see (Cramer, 1962; Cuppens,
1975; Feller, 1971; Galambos, 1988; Gnedenko & Kolmogorov, 1954; Ibragimov
& Linnik, 1971; Kawata, 1972; Linnik & Ostrovskii, 1977; Loeve, 1977; Lukacs,
1970; Lukacs, 1983; Lukacs & Laha, 1964; Petrov, 1975; Prokhorov & Rozanov,
1969; Ramachandran, 1967). Usually, proofs are presented for recent results
which are not contained in the mentioned books.

1.1. Definition and elementary properties


Let X be a real-valued random variable with distribution function F(x). The
characteristic function f(t) of the distribution function F(x) (or of the random
variable X) is defined as
oo
ltx itX
/ -oo e dF(x) = Ee , - o o < t < oo.
If Fix) is absolutely continuous with density p(x), then
oo

/ -oo eltxp(x)dx.
I f X is discrete with P(X = x n ) = p n , = 1,2,..., - 1, then
oo
= **.
n=l
Characteristic functions of the most frequently arising distributions can
be found in Appendix B.

1
2 1. Basic properties

THEOREM 1 . 1 . 1 . Any characteristic function f{t) satisfies the following condi-


tions:

(a) fit) is uniformly continuous;

(b) /(0) = 1;

(c) 1/(01 < 1 for all real t;

(d) f(t) = f(t) where the horizontal bar denotes the complex conjugate.

Two distribution functions are identi-


THEOREM 1 . 1 . 2 ( u n i q u e n e s s t h e o r e m ) .
cal if and only if their characteristic functions are identical.

According to this theorem, there is one-to-one correspondence between


characteristic functions and distribution functions.
A distribution is said to be a lattice distribution if it is concentrated on a
set of the form {a + nh, = 0, 1, 2,...} where a and h are some real numbers,
h > 0.

THEOREM 1.1.3. A characteristic function f(t) corresponds to a lattice distribu-


tion if and only if |/(io)| = 1 for some to 0.

Note that, due to Theorem 1.1.3, any periodic characteristic function is


the characteristic function of a lattice distribution. The converse is false (see
Appendix A, Example 3). In fact, the following theorem holds true.

THEOREM 1.1.4. Let fit) be a characteristic function. Then the following state-
ments are equivalent:

(a) there exists > 0 such that /() = 1/

(b) fit) is periodic with period ;

(c) the distribution function Fix), corresponding to fit), is purely discrete,


and all its points of discontinuity belong to the set {nh, = 0, 1,...}
where h = 2/

COROLLARY 1 . 1 . 1 . A characteristic function fit) corresponds to a lattice distri-


bution if and only if there exists a real number a such that the function emi/(i)
is periodic.

A characteristic function fit) corresponds to a discrete distribution if and


only if it is almost periodic. This immediately follows from the definition of an
almost periodic function.
1.1. Definition and elementary properties 3

THEOREM 1.1.5. If fit) is the characteristic function of a discrete distribution,


then

lim sup |/()| = 1. (1.1.1)


||-

THEOREM 1.1.6. If fit) is the characteristic function of an absolutely continuous


distribution, then

lim |/(f)| = 0. (1.1.2)

In Theorems 1.1.5 and 1.1.6, the converse is not true: there exist singu-
lar distributions whose characteristic functions satisfy (1.1.1) or (1.1.2) (see
Appendix A, Examples 6 and 7).

THEOREM 1.1.7. A characteristic function is real if and only if it is even.

A distribution function Fix) is called symmetric if Fix) = 1 Fi 0).

THEOREM 1.1.8. A distribution function is symmetric if and only if its charac-


teristic function is real and even.

THEOREM 1.1.9. Let fit) be the characteristic function of a random variable X.


Then the characteristic function of the random variable aX + b, where a and b
are real numbers, is fiat)eibt.

THEOREM 1.1.10. Let X and Y be independent random variables with char-


acteristic functions fit) and git). Then the characteristic function of + Y is
fit)git).

Note that the independence is not a necessary condition in this theorem:


there exist dependent random variables such that the characteristic function
of their sum is the product of characteristic functions of summands.
In terms of distribution functions, Theorem 1.1.10 means that the charac-
teristic function of the convolution of distribution functions is the product of
the corresponding characteristic functions. The converse is also true.

THEOREM 1.1.11 (convolution theorem). A distribution function Fix) is the con-


volution of two distribution functions F\ix) and F2ix):

Fix) = Fxix - y)dF2iy) = F2ix - y)dF1(y),

if and only if fit) = f\it)f2it), where fit), fiit) and f2it) are characteristic
functions of Fix), F\(x) and F2ix) respectively.
4 1. Basic properties

The convolution theorem implies, in particular, that the absolute value of


the characteristic function of the convolution of two distributions decreases
as \t\ > oo faster than the characteristic functions of the components. This
explains why the convolution is always 'smoother' (or at least no less smooth)
than each of the components: if distribution functions F\(x) and F^ix) are
absolutely continuous, then their convolution F(x) is also absolutely continu-
ous; if F\{x) and F2(x) have continuous densities, then the density of F(x) is
continuous; if the densities of Fi(x) and /^Ot) are times differentiable, then
the density of F(x) is no less than times differentiable, etc. In addition, the
maximum value, maximum of the derivative modulus, and the total variation
of the density of F(x) do not exceed those of the densities of F\(x) and F2(x).
On the other hand, it follows from the convolution theorem that the charac-
teristic function of the convolution tends to 1 as |i| 0 more slowly than the
characteristic functions of the components. This explains why the tail of the
convolution is always 'heavier' than those of the components.

1.2. Continuity theorems and inversion formulas


Let F\(X),F2(X), ... be a sequence of distribution functions. By definition, this
sequence converges weakly to a distribution function F(x) if

lim Fn(x) = F(x)


n>00
for all continuity points of F(x). The weak convergence is denoted

Lim Fn(x) = F(x)


n-too

or
w
FN^F.

THEOREM 1.2.1 (continuity theorem 1). Let Fi(x), i ^ t o , ...be a sequence of dis-
tribution functions and f\(t),f2(t),... be the corresponding sequence of char-
acteristic functions. The sequence F\(x),F2(x),... converges weakly to some
distribution function F{x) if and only if the sequence f\(t), /2W,... converges at
all points to some function f(t) which is continuous at zero. In this case, fit) is
the characteristic function ofF(x).

COROLLARY 1.2.1. sequence FI(X),F2(.X),... of distribution functions con-


verges weakly to a distribution function F(x) if and only if the corresponding
sequence of characteristic functions f\(t), /2(f),... converges to the characteristic
function f(t) ofF(x) at all points t.

COROLLARY 1.2.2. If a sequence of characteristic functions fi(t),f2(t),... con-


verges to some function f(t) at all points, and f(t) is continuous at zero, then f(t)
is a characteristic function.
1.2. Continuity theorems and inversion formulas 5

Note that a sequence of characteristic functions may converge at all points


to a function which is not a characteristic function (see Example 28 of Ap-
pendix A).

THEOREM 1.2.2 (continuity theorem 2). Let F\ix), F^ix), ...be a sequence of dis-
tribution functions and fiit),f2it),... be the corresponding sequence of char-
acteristic functions. The sequence Fi(x),F%(.x),... converges weakly to some
distribution function Fix) if and only if the sequence /(),/2M,... converges
uniformly on each bounded interval to some function fit). In this case, fit) is
the characteristic function of Fix).

Theorem 1.2.2 means that the one-to-one correspondence between char-


acteristic functions and distribution functions is continuous if the topology of
weak convergence and the topology of uniform convergence on each bound-
ed interval are taken in spaces of distribution functions and characteristic
functions respectively.
Theorems 1.2.1 and 1.2.2 imply, in particular, that if a sequence of charac-
teristic functions /i(), fiit),... converges to a characteristic function fit) at all
points, then it converges uniformly on each bounded interval.

COROLLARY 1.2.3. Let Fix),F\ix),F2ix),... be a sequence of distribution func-


tions and fit), fiit), fzit),... be the corresponding sequence of characteristic
functions. The sequence F\ix), F2ix),... converges weakly to Fix) if and only
if

for any a <b.

THEOREM 1.2.3 (inversion theorem). Let Fix) be a distribution function with


characteristic function fit). Then

for any a and b which are continuity points of Fix).

Inversion formulas of other kinds are given by the next two theorems.

THEOREM 1.2.4. Let Fix) be a distribution function with characteristic function


fit). If
log(l + \x\)dFix) < 00,
00
then
1 1 r dt
Fix) = 3 (-*/)
2 Jo t
6 1. Basic properties

for all which are continuity points ofF(x).

Let v.p. denote the Cauchy Principal Value, i.e.,


oo

/ .oo = T->
limoo lim
>0

THEOREM 1.2.5. Let F(x) be a distribution function with characteristic function


fit). Then

for any being a continuity point ofF(x).

We can use the inversion formula of Theorem 1.2.5 in all points if redefine
F(x) at discontinuity points as
F(x + 0) + F(x - 0)
F(x) =
2
For absolutely continuous distributions, inversion formulas for the distri-
bution functions are supplemented by the inversion formula for the density
function.

THEOREM 1.2.6. Let fit) be an absolutely integrable characteristic function.


Then the corresponding distribution function F(x) is absolutely continuous, its
density function p(x) is bounded and continuous, and

(1.2.1)

Note that this theorem implies that if f(t) is an absolutely integrable


complex-valued continuous function such that the function

is a probability density function, then f(t) is a characteristic function, namely,


the characteristic function of the distribution with density p(x).

COROLLARY 1.2.4. Let f(t),fi(t),f2(t),... be a sequence of integrable character-


istic functions, p(x),pi(x),p2(x), ...be the corresponding density functions. If

then the densities pn(x) uniformly converge to p{x):

s u p \pnhc) p(*)| 0 as > oo.



1.2. Continuity theorems and inversion formulas 7

If the characteristic function of an absolutely continuous distribution is


not integrable, then Theorem 1.2.6 cannot be applied for the calculation of the
density function. In this case, some modifications of (1.2.1) can be used, for
example,
pix) = lim [ e~itx (l - fit) dt.
oo 2 J- \ J

THEOREM 1 . 2 . 7 . Let f(t) be the characteristic function of a lattice distribution


concentrated on a set {a+nh, = 0, 1, 2 , . . . } . Denote jumps of the distribution
function in points a + nh by pn. Then
h fa
pn = ^ e~l(a+nh)t fit) dt.
J-nth

THEOREM 1 . 2 . 8 . LetF{x) be a distribution function with characteristic function


f(t), then
Fix) - Fix - 0) = lim - f e~itxfit)dt
>oo 2,1 J-T
for any t.

To conclude the section, we present several identities being useful in many


situations.

THEOREM 1.2.9 (Parseval). Let Fix) and Gix) be distribution functions with
characteristic functions fit) and git) respectively. Then
oo roo

/ -oo e~itxgix)dFix)= /
Joo fiy t) dGiy).

The theorem implies, in particular, that


OO fOO
/ -oo fit)dGit)= J/ oo git)dFit).

THEOREM 1.2.10 (Parseval-Plancherel). Let fit) be the characteristic function


of an absolutely continuous distribution with density function pix). Then \ fit)\2
is integrable if and only ifp2ix) is integrable, and, in this case,
oo 1 r oo

/ -oo p\x)dx= Joo \fit)\2dt.

A discrete analog of this theorem can be formulated as follows.

THEOREM 1 . 2 . 1 1 . Let Fix) be a distribution function with jumps p\, 2,


Then

oo , -T

n=1
8 1. Basic properties

If F(x) has only a finite number of jumps, then the series at the left-hand
side should be replaced by the finite sum.

COROLLARY 1.2.5. A distribution function F(x) is continuous if and only if the


corresponding characteristic function f{t) satisfies the condition

Another probabilistic variant of the Parseval-Plancherel theorem concerns


the difference of characteristic functions.

THEOREM 1.2.12 (Parseval-Plancherel). Let p(x) and q(x) be two probability


density functions with characteristic functions f(t) and g(t) respectively. Then
OO 1 roo
/ -OO \p(x) - q(x)]2dx = ^ / OG
^ J |/(f) - g(t)\2dt,

provided that the integrals exist.

1.3. Criteria
We start with some general necessary and sufficient conditions, and then
present various sufficient conditions for a function to be the characteristic
function of a probability distribution. We also give some methods of con-
structing characteristic functions satisfying given properties. At the end of
the section, some necessary conditions are presented which supplement nec-
essary conditions given by Theorem 1.1.1. Other necessary conditions (in the
form of inequalities) are given in Section 1.4. Criteria related to unimodal
distributions will be given separately in Section 1.6.
A complex-valued function f(t) of the real variable t is called non-negative
definite if it is continuous, and the sum

NN _

j=l k=l
is real and non-negative for any positive integer and any real t\,t2, Jn
and complex , 2, , ,-

THEOREM 1.3.1 (Bochner-Khintchine). A complex-valued function f(t) of the


real variable t is a characteristic function if and only if

(a) fit) is non-negative definite;

(b) f(0) = 1.
1.3. Criteria 9

THEOREM 1.3.2 (Cramer). A bounded and continuous function f(t) is a charac-


teristic function if and only if

(a) /(0) = 1,

fA r A
(b) / fit- s)e"(i~s) dtds > 0 for all real and all A > 0.
Jo Jo
THEOREM 1.3.3 (Khintchine). A complex-valued function fit) of the real vari-
able tisa characteristic function if and only if there exists a sequence ofcomplex-
valued functionsg\it),g2it),..., satisfying the condition
oo

/ -oo \gnit)\2dt = 1,
and such that oo
/ -oo gnit + s)gn(s)ds
uniformly in each bounded interval.

For absolutely continuous distributions, this criterion has a simpler form.

THEOREM 1.3.4. A complex-valued function fit) of the real variable t is the


characteristic function of an absolutely continuous distribution if and only if it
can be represented in the form
oo

/ -oo g(t + s)g(s) ds

where git) is a complex-valued function satisfying the condition


oo

/ -oo \git)\2dt = 1.

The Khintchine criterion is very useful in constructing examples of char-


acteristic functions satisfying given conditions, for instance, those vanishing
on some sets, say, outside a neighbourhood of the origin. An advantage of the
Khintchine criterion over the Polya one (see Theorem 1.3.8 below), which is al-
so often used in constructing such functions, consists in the possibility to obtain
smooth characteristic functions, i.e., distributions having as many moments
as it is required, whereas all distributions satisfying the Polya criterion do not
have expectation. A special case of the Khintchine criterion (Zolotarev, 1967)
gives characteristic functions vanishing outside the interval [1,1]: letg(i) be
a real-valued even function such that git) = 0 for |i| > 1/2 and f^g2^) = 1;
then the convolution oo

/ -oo git + s)gis) ds


10 1. Basic properties

is the characteristic function of the absolutly continuous distribution with the


density
2 ( rr 1/2
1/2

p(x) = " ( / g(t)cos(tx)dt) .

A criterion similar to Theorem 1.3.4 (Berman, 1975) is formulated in terms


of the covariance function. A complex-valued Borel functioni?(s, t), oo < s,t <
oo is a covariance function if for any positive integer and any real Si, ...,SN
and u\,..., UN,

i=1 j=1
THEOREM 1 . 3 . 5 . Let R(s, t) be a covariance function satisfying the condition

/
oo
R(s,s)ds< oo. (1.3.1)
oo
Then the function

f(t)=J-~ m (1.3.2)
J_00R(s,s)ds

is the characteristic function of an absolutely continuous distribution.


Conversely, if fit) is the characteristic function of an absolutely continuous
distribution, then there exists a covariance function R(s,t) satisfying (1.3.1),
such that fit) is representable as (1.3.2).

PROOF. Let Ris, t) be a covariance function satisfying (1.3.1). Consider a com-


plex Gaussian process Xit), oo < t < oo, associated with Ris, t), i.e., such that
EX(t) = 0 for all t and EX"(s)X(i) = Ris, t) for all s, t (such a process exists due
to the Kolmogorov existence theorem).
It follows from (1.3.1) thatX(f) belongs to L2 almost surely, and so there is
a measurable version of the Fourier transform process
oo
/ -00 elusXis)ds, -oo<u<oo.
We have
OO
/ -00 Ris,s + t)ds= /
J 00
00 EXXs^XXs +1) ds
00
Xis)Xis + t)ds
00

=
2
f
J-
00
00
e~mt\Xiu)\2du,
1.3. Criteria 11

hence

^11(8,8 +t)ds _ f^e^EjXj-u^du


(1.3.3)
f^R(s,s)ds ~ f^EjXiu^du

The right-hand side of (1.3.3) is the Fourier transform of the function

E\X(-y)\2
p(y) =
iz.m^du

which, evidently, is a probability density function, therefore fit), which is


the left-hand side of (1.3.3), is the characteristic function of an absolutely
continuous distribution.
Conversely, let f(t) be the characteristic function of an absolutely continu-
ous distribution. Then, due to Theorem 1.3.4, there exists a complex-valued
function g(t) satisfying the condition

such that
r
J
\g(t)\2dt = 1,
oo
jt

oo

/ -oo g(t + s)g(s)ds.

We can set
R(s,t)=g(s)g(t).


We also present quite a recent criterion which is due to (Trigub, 1989).

THEOREM 1.3.6. A complex-valued function f(t) of the real variable t is a char-


acteristic function if and only if

(a) f{0) = 1;

(b) f(t) is continuous;

(c) the improper integral


r dt
/ tf(f)-/(-*)]-
Jo t
converges;

(d) lim i f(t)dt > 0;


T->oo 1 J - T
12 1. Basic properties

(e) t h e r e e x i s t s k o s u c h t h a t f o r a n y k > k o a n d . a n y ,

r f ( m
(sgnx)',+1
t ) d t
>0. (1.3.4)
J-oo (x + iii t f + 1

To prove the theorem, we need two lemmas.

LEMMA 1.3.1. (a) F o r a n y r e a l 0 , a n y c o m p l e x s u c h t h a t 3z * 0 a n d a n y

N > 14
1 N
/r _ .
dJ t = e (sgnx + sgnSz) +
J-

(b) F o r a n y r e a l 0 a n d a n y > 1 ,

N u i N i N
f e d t e e ~
/ = + + 02 2
J - N t + i x + i xx ii xx 1 + '

w h e r e | | < c a n d 16*21 < c , c i s a n a b s o l u t e c o n s t a n t

PROOF. Due to the Cauchy theorem, for > \z\,

N i x i
1 f e d t 1 1 f e C

2 J -
t z
= 2 e (sgnx + s g n S z ) - /
2 J r z

Here ^ is a half-circle of radius centered at zero (the upper half-circle if


> 0, and the lower one if < 0). In the first case (x > 0), for instance,
eixNe*(,
- , sin.
f N r 1 ^
Jr - f
Jo N e i (
P - - z Jo

Obviously,
, Nx sin < ,
Jo
and, since
sin > -

for 0 < < /2,
!2 r/2
/ ~" = 2 e~Nxsin(pd(p< 2/ -
e ~ * < .
Jo Jo Jo

Hence
-Wxsm 2
<
1 ' 1+'
1.3. Criteria 13

and || < 2.
To prove (b), let us integrate by parts. We have
. _ eiN e~iN fN elidt
J-N t + ix + ix ix (t + ix)2'

If |jc| > N/2 > 1/2, then, integrating by parts once more, we see that the absolute
value of the integral does not exceed
2 42V 2 8 42
+ T^T < 2 + 2 <
N2 +x2 |x|3 N + x2 x 1 + x2'

If |jc| < N/2, then we use (as above) the Cauchy theorem with ^ being the
upper half-circle:

[ eiZdZ
< N
-sin <pd(0< 5
<
JrN(z + ix)2 -(N-\x\)2Jo
~ (-\x\)2 J
1 + x2'


LEMMA 1.3.2. Let f(t) satisfy conditions (a)-(c) of Theorem 1.3.6. Set

f f(t)dt fM fWt
g(x) = v.p. / = lim / -.
J-oo X + lt M->o J- x + lt

If
lim g(x) = 0,
|x|->oo
then for any real y 0,
(X) roo
lxy
/ -oo e g(x)dx = 2nisgny /
Jo / ( - t s g n y ) e ' -Mdt.

PROOF. D e n o t e

*<*)= J-M X + lt

It is seen from the equality

gM(x) - gN&)
fM fM f-2 Jf
m )]
=L tt*s7? --> 7
where > > 0, that g\t&) > g(x) as oo uniformly in each bounded
interval (we use the boundedness of f(t) in the first integral and the Abel test
in the second one). Therefore
rN . fN . foo fNy eiudu
/ elxyg(x)dx= lim / elxygM(x)dx= / f(t)dt .
J-N M-kjo J-N J-oo JNy " + ltJ
14 1. Basic properties

Let us take the limit of the right hand side as oo. By virtue of of
Lemma 1.3.1 (N > l/|y|),

Ny e
iu
du eiNy e~iNy
+ eN(y,t)-
-Ny u + ity UN + it)y UN - it)y ' 1+ t2y2'

In view of the boundedness of fit) and the Lebesgue theorem,


d t
f , dN(y,t)
lim / fit)QNiy,t) 2 2
= / f i t ) lim dt
N^oo y_oo 1+ty J-oo N->oo 1 + t2y2

(the existence of the limit in the right-hand side follows from the previous
equality). Thus, it remains to verify that
iN
roo ( eiNy e-iNy \ roo ( giNy e~ y \
lim / f(t) + r \dt - / f i t ) lim - + r dt.

N>oo oo + lt -It J J-oo N->oo yN + lt - it J

The right-hand side of this relation is equal to zero, and the left-hand side is

l i m \giN)eiNy -gi~N)e~iNy} = 0.
N>o
Thus, by virtue of Lemma 1.3.1,
oo ro /
coo pixy
pixy

/
elxygix)dx = / f(t)dt / rdx
-oo J oo J oo X + it
oo
/ -oo fit)etyisgny - sgn t)dt.

PROOF OF THEOREM 1.3.6. The necessity of conditions (aHd) is obvious (see
(Lukacs, 1970). Let us prove the necessity of condition (e). Let
oo
/ -oo eixtdFix)

for some distribution function Fix). For any realy * 0, by virtue of the Fubini
theorem
fM fit)dt f rM eltx
gM(y) = / ^ = / dFix) / -dt.
J-M y + it J-oo J-M y + it
By Lemma 1.3.1 iz = iy), the inner integral is bounded by a constant depending
only on y and has a limit as > oo; therefore, for y * 0,

giy) = v.p. f f^dt _ f e -^( S gna: - sgny)dF(x),


J-ooy + it Joo
1.3. Criteria 15

which implies (for y > 0 and y < 0, respectively)


roo
g(y) = 2 / e~yxdF(x) (1.3.5)
Jo
and

0
g(y) = 2 / e~yxdF(x). (1.3.6)
Joo
Therefore, for any k > 0 and y 0,

( - l ) V % 0 s g n y >0

(see (1.3.4)). The necessity is thus proved.


Let us turn to the proof of sufficiency. First note that if g(x) is bounded on
the positive half-line and there exists > 2 such that for any >
then g (n_1) (:r) < 0 for > 0. Indeed, for any > 0 and x>xq,

V
v=0 '

which implies that o) < 0. Therefore, in view of the conditions of the


theorem (the boundedness of g(x) was proved in the proof of Lemma 1.3.2), for
any k > 0 and any * 0,
(-l)y(*)sgnx>0.
From the theorem on the completely monotone functions (Feller, 1971), taking
into account that one-side limits of g(x) exist at zero, we conclude that there
exist bounded increasing functions Fi(x) and i^Oe) such that
roo
g(t)= / e'^dF^x), t> 0, (1.3.7)
Jo
r0
g{t) = - / e~txdF2(x), t < 0. (1.3.8)
Joo
Without loss of generality, we may assume that Fi(0) = ^ ( 0 ) . Set

Fix) = ^-F\(x)
2
f o r x > 0 and
Fix) = J-F2ix)
2
for < 0. Then the function
oo
/ -oo eUxdFix)
16 1. Basic properties

is a characteristic function. Indeed,

fo(0)= lim - ( e-txdF1(x)+ f etxdF2{x)\


->+ 2 \Jo J-oo J
1 , r , s , s, 1 , f ^ 2tdx
= lim \g(t) -g(-t)] = lim / fix)-,
2 *->o+o 2 t~>o+o j^QO ^ + ^
1 , r ft\ du
- - lim / / - 2= =1

In view of (1.3.5H1.3.8), for t 0,

f0(t)dt
/ r- = g(x).
J-OO X + lt

Now it suffices to prove that fit) = fait), i.e., thatg(i) uniquely determines fit).
Assuming git) = 0, we, due to Lemma 1.3.2, obtain

fi-tsgny)e-tMdt =0
Jo

for anyy 0, or after the substitution t = ln(l/u),

f1
/ /(lnusgny)u w_1 du = 0.
Jo

Considering this relation for all complex y 0 and taking into account that
fit) is continuous, we obtain fit) = 0.

A number of sufficient conditions for characteristic functions can be ob-


tained if we study how characteristic functions change under some transfor-
mations of the corresponding random variables and distribution functions.

THEOREM 1.3.7. Let fit) be a characteristic function. Then the following func-
tions are also characteristic functions:

(a) fit) (or, which is the same, fit));

(b) fnit) for any non-negative integer n;

(c) */(*);

(d) \fit)\2;

(e) elbtfiat) for any real a and b.

These characteristic functions correspond to the following transformations:


1.3. Criteria 17

(a) ifX is a random variable with characteristic function fit), then fit) is the
characteristic function of X;

(b) if Xi, ...,Xn are independent identically distributed random variables


with the common characteristic function fit), then fnit) is the character-
istic function of the sum X\ + ... + Xn\

(c) if Fix) is a distribution function with characteristic function fit), then


9/() is the characteristic function of the distribution function +
Fix)-Fi-x- 0));

(d) if X\ and X% are independent identically distributed random variables


with characteristic function fit), then X\ X<i has characteristic function
|/(i)|2 (symmetrization);

(e) ifX is a random variable with characteristic function fit), then elbtf(at)
is the characteristic function of aX + b.

Let fit, a), t,a e M 1 , be a function of two real variables


THEOREM 1 . 3 . 8 . satis-
fying the following conditions:

(a) for any fixed a the function fait) = fit, a) is a characteristic function;

(b) for any fixed t the function ftia) = fit, a) is a measurable function.

Then for any distribution function H{x), the function

is a characteristic function.

In this theorem, the characteristic function git) is called the mixture of


characteristic functions fit, a). A wide class of easily applicable sufficient
conditions can be obtained from this theorem as its partial cases. We point out
some of these conditions.

Let flit), fiit), ...be characteristic functions and a\, 02,... be


COROLLARY 1 . 3 . 1 .
non-negative numbers such that ^ an = 1. Then the function

00
fit) = Y^aJnit)

is a characteristic function.
18 1. Basic properties

COROLLARY 1.3.2. Let fit) be a characteristic function. Then

m - 1

is also a characteristic function.

PROOF. We have

2 m
8 ' 2-fit) 2-fit) ^ 2"

and hence, by Corollary 1.3.1, ^(i) is a characteristic function.

COROLLARY 1.3.3. Let fit) be a characteristic function. Then

= f fiu)uP~ldu
tP Jo
is also a characteristic function for any > 0.

PROOF. Let Fix) be the distribution function corresponding to the characteris-


tic function fit). Consider the function

{-~ 0 <x<t,
rtix) = I tP
I0 otherwise.

This function is a probability density (this can be examined directly) for all
t > 0 and > 0. Denote its characteristic function by htiu). It is easy to see
that htiu) = h\{tu). Using the Parseval theorem (Theorem 1.2.9) we obtain

git) = = fiu)rtiu)du
tP Jo J-oo
oo /
/
htiu)dFiu) - / hiitu)dFiu).
-oo J oo
Thus, by virtue of Theorem 1.3.8, git) is a characteristic function.

The case = 1 is especially important. We shall return to this case in


Section 1.6.

COROLLARY 1.3.4. Let fit) be a characteristic function. Then for any > 0
g{t) = epm-

is also a characteristic function.


1.3. Criteria 19

The following sufficient condition, which is due to (Polya, 1949), has found
wide applications.

THEOREM 1 . 3 . 9 . Any real-valued and continuous function fit) satisfying the


conditions

(a) f(0) = 1;

(b) f(t) = fit);

(c) f(t) is convex for t > 0;

(d) lim^oo/(*) = ()

is a characteristic function of an absolutely continuous distribution.

COROLLARY 1 . 3 . 5 . Any real-valued and continuous function fit) satisfying the


conditions

(a) fiO) = 1;

(b) fit) = fi-t);

(c) fit) is convex for t > 0;

(d) fit) > 0

is a characteristic function.

COROLLARY 1 . 3 . 6 . Let git) be a real-valued, even, twice differentiable function


such that

(a) (0) = 0;

(b) lim^oogM = - o o ;

(c) g"it) + > 0 for all t > 0.

Then
fit) = e*(i)
is a characteristic function.

COROLLARY 1 . 3 . 7 . Let git) be a real-valued, even, twice differentiable function


such that

(a) giO) = 0;

(b) g'it) and g"it) are bounded;


20 1. Basic properties

(c) git) = 0(t) as t -> oo.

Then
/(f) = exp{- |f|+*<*)}
is a characteristic function, provided > 0 is sufficiently large.

The class of characteristic functions satisfying conditions of Theorem 1.3.9


(the Polya characteristic functions) is a subset of the class of characteristic
functions given by the following criterion which follows from Theorem 1 . 3 . 5 .

THEOREM 1 . 3 . 1 0 . Letgiu) be a non-negative measurable function satisfying the


condition
giu)du = 1;

then

is the characteristic function of an absolutely continuous distribution.

A number of sufficient conditions for characteristic functions of lattice dis-


tributions can be obtained on the basis of the following theorem.

Let fit) be a characteristic function such that f(t) = 0 for


THEOREM 1 . 3 . 1 1 .
|f I > a for some a > 0. Then any periodic function grit) with period > 2a,
coinciding with f(t) on the interval [772, 12], is the characteristic function of
a lattice distribution.

Theorems 1.3.9 and 1.3.11 immediately imply the following criterion.

THEOREM 1 . 3 . 1 2 . Let fit) be a real-valued function satisfying the conditions

(a) /(0) = 1;

(b) fit) = fit);

(c) there exists an interval (,) such that f{t) is convex on (0,a);

(d) f(t) is periodic with period 2a;

(e) fit) > 0 on (0,a) and /(a) = 0.

Then fit) is the characteristic function of a lattice distribution.

Theorems 1.3.11 and 1.3.12 are very usefully combined with the following
criterion.
1.3. Criteria 21

THEOREM 1.3.13. Let f(t) be a periodic (say, with period T) characteristic func-
tion. Then for any a > 1 satisfying the condition

< ^ [ T fit) dt, (1.3.9)


1 +a Jo

the function

(1 + a)fit) - a (1.3.10)

is a characteristic function as well.

PROOF. By virtue of Theorem 1.1.4, the distribution corresponding to fit) is


concentrated on the set {2/, = 0, 1,2,...} . Let p n , = 0,1, 2,..., be
the weights of the points 2/. Then

fit) = + 2inntIT (1.3.11)


* 0

and (Theorem 1.2.7)

Po fit)dt. (1.3.12)

Consider the distribution concentrated at the same points 2/, -


0,1,2,..., with weights go = Po + Po andq^ = (1 + a)p n , = 1,2
This is indeed a probability distribution because, in view of (1.3.9) and (1.3.12),

a
q0 = (1 + a) 0 >0
1+a

(for 0, obviously, qn > 0), and

= Po + Po - + (1 + ) = Po + Po - + (1 + )(1 - 0 ) = 1.

But taking (1.3.11) into account, we see that the characteristic function of this
distribution is

git) =po + ccpo ~ cc + (1 + ) UnnttT


n*0

= (1 + a) Po + linntIT a
71*0
= (1 + a)fit) - a,

which coincides with (1.3.10).


22 1. Basic properties

The following sufficient condition is connected with derivatives of charac-


teristic functions. It turns out that each derivative of even order is itself a
characteristic function up to a coefficient. More exactly, the following theorem
holds true.

THEOREM 1.3.14. If a characteristic function f(t) is 2 times differentiable, then


the function

g ' /2>(0)
is a characteristic function.

Now we turn to necessary conditions for characteristic functions. Some


necessary conditions are contained in Sections 1.1 and 1.4. Here we give some
other necessary conditions, in particular, those related to the limit behavior of
the characteristic function as t > 0 and |i| > oo.

THEOREM 1.3.15. For any characteristic function f(t\


limsupSRf(i) > 0.
|f|-oo

PROOF. Suppose the contrary:

lim sup 9/() < 0.


|/|oo

Then there exist > 0 and > 0 such that Sif(t) < for || > a. Since
u(t) = 3\f(t) is a characteristic function, it is a positive definite function, i.e.,
for any positive integer and any real t\, and complex \,..., ^

NN _

j= 1 k=l

Let us takeiV > ^ + 1 and set = = ... = = 1, tj = ja,j = 1,2, ...,N. Then

S = N[u(0) + u(a) + ... + u((N ~ Da)]


< - N(N - 1) < - ( - 1 ) = 0.
1
This contradiction proves the theorem.

Although, due to this theorem, a characteristic function cannot possess a


negative limit as |i| > oo, it can be negative for all sufficiently large t (see Ex-
ample 10 of Appendix A). There is an open question, whether a characteristic
function f(t) exists such that
lim sup 9t/(i) < I lim inf 9l/(i)|
|t|-oo I'M 00
1.4. Inequalities 23

and
lim inf 9/() < 0.
|(|-oo

THEOREM 1.3.16. If a characteristic function f(t) satisfies the condition fit) =


1 + oit2) as t 0, then fit) = 1.

THEOREM 1.3.17 (Marcinkiewicz). If Pit) is a polynomial of power m > 2, then


the function fit) = exp{P(i)} cannot be a characteristic function.

1.4. Inequalities
Various inequalities for characteristic functions are the subject of Chapter 2.
However, some general inequalities are given in this section. They can be
considered as necessary conditions for characteristic functions, therefore they
supplement the results of the preceding section. We also present several well-
known estimates of the closeness of distribution functions in terms of the
closeness of the corresponding characteristic functions. Recent developments
in this area are given in the next chapter.

THEOREM 1.4.1. For any characteristic function fit) and any positive integer n,

1 - Xfint) < n{l - Wfit)ln} < n2[ 1 - */(*)]

for all t e (oo, oo).

PROOF. First let us prove the following elementary inequality:


n 1 +
cos x<^ ~ COSfo*) . , 1\
(1.4.1)

It is easy to show by induction that the inequality
sin(rtx)
< (1.4.2)
sin ;
holds for all real and all positive integers n. Let

fix) = cos" cosinx).

Then fix) < n. Moreover,

fix) = n2 cosn~1x sinx + sininx).

Therefore fix) can have a (relative) maximum only if either

sinx = 0
24 1. Basic properties

or
nij sin(nx)
sin* 0 cosandx= .
sinx
In the former case, = nk(k is integer), hence fink) < 1. In the latter case,
. . cosx sin(nj:) x sin((n l)x)
f(x)= COS(TIX) =
sin : sin :
so t h a t we see from (1.4.2) t h a t

fix) <n- 1

and (1.4.1) is proved.


To prove the theorem, we have to distinguish two cases:
(a) is an even integer. In this case, x71 is a convex function, and using
the Jensen inequality we obtain
oo
/ -oo cosn(tx)dF(x)
and hence, in view of (1.4.1),
oo
/ -oo cos (ntx)dF(x);
hence
7i[*/(f)] n < ( - 1) + Xf(nt),
which is w h a t we set out to prove.
(b) is an odd integer. The assertion is trivial if = 1 or ifn{l[91/()] } >
2 and > 3. Without loss of generality, we can assume

n - 2 < 7i[9t/(f)]B.

Since > 3 we have 1 < [91/()] _1 and 0 < Slf(t).


Consider t h e function

() = zn an ( )~ (1.4.3)

for \z\ < 1 a n d 0 < < 1 < nan~l a n d 1 < 2 < nan. An elementary
computation shows t h a t y(z) > 0 for \z\ < 1.
L e t u s s e t a = 3if(t), = c o s ( t x ) i n (1.4.3). T h e n

cos n (fce) > Wf(t)]n + [cos(tx) - XfmnWm]"-1.

After integration we obtain


J I
c o s n ( t x ) d F ( x ) > [9l/(i)] n .
1.4. Inequalities 25

Then from (1.4.1) it follows that


oo

/ -oo [ - 1 + cos(ntx)]dF(x) > n[3if(t)]n


or
1 - <Rf(nt) < n { l - Wf{t)]n}.

COROLLARY 1.4.1. For any characteristic function f(t) and any positive integer
n,
1 - \f(nt)\2 < n2[l - |/(i)|2]
for all t e (00,00).

Theorem 1.4.1 was proved first for = 2k and then extended to all integer
in (Heathcote & Pitman, 1972). Further extension is impossible in the sense
that for any non-integer positive b there exists a characteristic function f(t)
and i 0 such that 1-9/( 0 ) > 2 [l-9l/(i 0 )] Indeed, consider the characteristic
function f(t) = cos t and set = 2. Then

0 = b2[l - cos 2] =b2[ 1 - SH/(i0)] < 1 - cos2rc6 = 1 - <Rf(bt0)

for any non-integer positive b.


Let (aij)1j=1 stand for an nxn matrix with elements ay, i j = 1,2,.,.,. The
Bochner-Khintchine theorem means that any matrix of the form (/(ij tj))*j = v
where f(t) is a characteristic function, is non-negatively definite for any and
t\,...,tn. This implies, in particular, that the determinant of such a matrix is
always non-negative. Using this fact, we arrive at the following inequality.

THEOREM 1.4.2. For any characteristic function fit) and any t\ and 2,

|/(*i +12)I > |/()||/(2)| - (1 - N l W l - l/M 2 ) 1/2 (1-4.4)

PROOF. Set = 0- By virtue of the Bochner-Khintchine theorem,

/ 1 f(~t 1) /(-i2) \
det /() 1 fit 1 - i 2 ) > 0,
\fit2) f(t2-h) 1 J
i.e.,

1 + /(-il)/(i2>/(il - t2) + f(h)f(-t2)f(t2 - )


- /()2 - \m\2 - i/cii - w i2 > 0,
which yields

1 + 2|/()||/(2)||/( - 2)| - |/()|2 - |/(2)|2 - |/( - 2)|2 > 0. (1.4.5)


26 1. Basic properties

Let us demonstrate that

|/(*1 - 1 2 ) > |/()||/(2)| - (1 - |/()| 2 ) 1/2 ( - |/(2)|2)1/2. (1.4.6)

Indeed, if |/( - t2)\ > |/(fi)||/(f2)|, then (1.4.6) is obvious. If |/(ix - t2)\ <
|/(fl)||/(f2)|, then (1.4.6) is equivalent to (1.4.5) (this can be verified directly)
and therefore is again true.
Now replacing t2 by t2 in (1.4.6) and taking into account that |/()| =
|/(-f)|, we finally obtain (1.4.4).

COROLLARY 1.4.2. Let f(t) be a characteristic function. If |/(fi)| > cos q>\ and
\f(t2)I > cos 2, where >0,2>0, and + q>2< /2, then

|/( +1 2 )I > cos(<pi + 2).

In particular, if
|/(f)| > cos ,
then
\f(nt)\ > cos(nq>).

From Theorem 1.4.2, we obtain the following interesting assertion.

LEMMA 1.4.1. Let f{t) be a characteristic function. Suppose that

|/(*i)| > 1 - ct\ (1.4.7)

and

|/(f2)| > - Ct\ (1.4.8)

where c is a positive constant, and t\ > 0 , t 2 > 0. Then

|/(ii +12)I > 1 - cit + t2f. (1.4.9)


The sign V can be replaced by ">' simultaneously in all three inequalities
(1.4.7M1.4.9).

PROOF. Without loss of generality we may assume that t\ < 1/c and t\ < 1/c.
Otherwise the right-hand side of (1.4.9) is negative, and the assertion of the
lemma is obvious.
Suppose the contrary: let (1.4.7) and (1.4.8) hold but (1.4.9) do not hold,
i.e.,

|/(fi +12)I < 1 - ciH +1 2 ) 2 . (1.4.10)


1.4. Inequalities 27

From (1.4.7) and (1.4.8), taking into account that 1 - ct2 > 0 and 1 - ct\ > 0,
we obtain

l/(ii)ll/(i 2 )i - ( i - \f(h)\ 2 ) V 2 a -1m\2)V2


> (1 - ct\){ 1 - ct2) - [1 - (1 - ct2)2]y2[l - (1 - ct2)2\v2
= 1 - ct2 - ct2 + c 2 i f i i - (2c*2 - c 2 it) 1/2 (2cii - c 2 ^) 172 ,

while (1.4.10) means that

|/(*i + t2)I < 1 - ct\ - ctj ~ 20*1*2>

therefore, making use of Theorem 1.4.2, we obtain

c 2 i 2 i 2 + 2c*ii 2 < (2ci2 - c2i^)1/2(2ci| - c2t$)y2,

which is equivalent to
(f + t2)2 < 0,

which is impossible. The contradiction proves (1.4.9).


The case '>' is proved in a similar way.

COROLLARY 1.4.3. Let f(t) be a characteristic function. Then |/()| > 1 - ci^
implies |/(rafo)| > 1 c(nto)2 for any = 1,2,...

THEOREM 1.4.3. Let f(t) be a characteristic function. Then the following asser-
tions take place.

(1) If\f(t) I > 1 ct2 (c is a constant) in some neighborhood of the origin, t 0,


then |/()| > 1 ct2 for all t. The assertion remains true if'>' is replaced
by ">' in both inequalities.

(2) I f R f ( t ) > 1 ct2 in some neighborhood of the origin, t * 0, then 91fit) >
1 ct2 for all t. The assertion remains true if'>' is replaced by ">' in both
inequalities.

PROOF. Assertion 1 immediately follows from Lemma 1.4.1. Let us prove


assertion 2.
From the left inequality of Theorem 1.4.1, setting = 2, we obtain

m t ) f < l ^ l (1.4.11)

for all t. Suppose that > 1 ct2 in some neighborhood of the origin but
not everywhere. Let ib, b) be the widest interval such that 3\fit) > 1 ct2 for
28 1. Basic properties

t e i-b,b), t 0. Then <R/() = 1 - cb2, and taking (1.4 .11) into account we
obtain

cb2 cb2 c2b*


1 < 1- 2 +
2 4 16
2 < 1 + 91/(6) _ 1 + 1 -cb2 _1_cb2
< Wf(b/2)]
2 2 2
This contradiction proves the assertion.

THEOREM 1.4.4. Let f(t) be a characteristic function such that

|/(i)|<c for |*| >6, (1.4.12)

0 < c < 1, b > 0. Then

2
|/(i)| < 1 - ^ t (1.4.13)

for |i| < b.

PROOF. Suppose the contrary: let (1.4.13) do not hold. Then there exists
e (0, b] such that

l/M > 1- ^ t l (1.4.14)

Let be the nonnegative integer satisfying the relations

b b
i- < i 0 ^
2 n+1 2n

Since 2 n+1 io > b, we obtain

|/(2 n + 1 i 0 )|<c. (1.4.15)

On the other hand, from (1.4.14) and Corollary 1.4.3, we obtain


2
|/(2 n+1 io)| > 1 - ^ ( r t 0 ) 2
>1- V ( 2 - ^ ) =c

which contradicts (1.4.15).

In connection with Theorem 1.4.4, the question arises whether the coeffi-
cient 1/4 at the term ^ r t 2 can be improved (made larger) or not. It is clear
that this constant cannot be larger than 1.
1.4. Inequalities 29

THEOREM 1.4.5. For any characteristic function f(t),

|3/(f)| < - *f(2t)

and
|/(i + s) fit)\2 < 2[1 9t/(s)]

for all real s and t.

PROOF. Let us prove the first inequality. Making use of the Cauchy-Schwarz-
Buniakowskii inequality, we obtain
oo

/ -oo sin(tx)dF(x)
1/2
< Iy sin2(tx)dF(x)
r 1/2
[1 - cos(2te)]dF(x)
2 J - 0 0

The second inequality is proved similarly.

THEOREM 1.4.6. Let fit) be the characteristic function of a non-degenerate dis-


tribution (not concentrated in one point). Then there exist positive numbers
and such that |/(f)| < 1 -et2 for \t\<S.

Of course, the inequality in Theorem 1.4.6 is not uniform with respect to


characteristic functions, i.e., the numbers and cannot be chosen to be the
same for all characteristic functions. In fact, this is impossible even in each
class of characteristic functions whose first moments exist and are fixed. Let
us demonstrate this. For the sake of clarity, we consider the case of = 2,
i.e., where the expectation and the variance are fixed. In the general case,
an example can be constructed similarly, using the same idea. Without loss
of generality we can suppose that the expectation is equal to zero. Thus, for
any sequences t\,t2,.. and ,2,... (* > 0, ^, > 0, k = 1,2,...) such that
tk > 0 and > 0 as k > 00, we have to construct a sequence of characteristic
functions /i(),/2W, , whose distributions have zero expectations and the
same variances a 2 , and such that >1 for k large enough. Let us
set
fkit) =Pk + i 1 - Pk) cos ( g tk
\y/l -pk
30 1. Basic properties

where 1 > pk > 1 . Then, as is easy to see, the variances of the distribu-
tions corresponding to the characteristic functions fkit) are equal to a 2 , and,
for k large enough, fk(tk) > Pk > 1 -
Now we present several so-called truncation inequalities connecting the
behavior of the tail of a distribution function with the behavior of the corre-
sponding characteristic function in the neighborhood of the origin.

Let Fix) be a distribution


THEOREM 1 . 4 . 7 . function with characteristic function
fit). Then for any 0 < < 2,


\x\<a!t
x2dFix) <
(l-cosa)i^
[1 - /(*)] (1-4.16)

for all 0 < t < a.

PROOF. Let us fix an arbitrary e (0,2). Since the function (1 cosx)/x2


decreases for 0 < < 2, we obtain
1 cos ,
cosa:<l , \x <a.
or
Hence,

3if(t) = cos(tx)dF(x) < f (1 - c(txf)dF(x) + f dF(x)


J-oo J\tx\<a J\tx\>a

where
c = (1 - cos a)/a2 = 1 - t2c [ x2dF(x),
J\x\<a/t

which implies (1.4.16).

COROLLARY 1 . 4 . 4 . Let F(x) be a distribution function with characteristic func-


tion f(t); then for t > 0

Lx\<Vt tz

COROLLARY 1 . 4 . 5 . Let Fix) be a distribution function with characteristic func-


tion fit). Then for any 0 < <2,

f x2dFix)< 2 2|l-/ffl|
J\x\<a/t ( 1 - c o s a)tz
for allO <t < a.
1.4. Inequalities 31

THEOREM 1.4.8. Let Fix) be a distribution function with characteristic function


fit). Then for > 0,

f dF{x)<- [\l-^f{u)]du
J\x\>Vt t Jo

and
f dF(x) <- [1 - f(u)] du.
J\x\>2!t t J-t

In the conclusion of this section, we give some results concerning estima-


tion of closeness of distributions via closeness of their characteristic functions.
Recent developments in this area are presented in Chapter 2.

THEOREM 1.4.9. Let F(x) and G(x) be two distribution functions with charac-
teristic functions fit) and git) respectively. IfGix) has a derivative and

sup G'ix) < a < oo,

then for any positive and any b > 1/(2),

rT I fit)-git) a
sup - G(x)| <b f dt + cib)
X J-'. r

where cib) = bc^ib), and Cq() is the root of the equation

fJo
x,i
sin 2 u j 1
du - - + -.
4 8b
THEOREM 1.4.10. Let Fix) and G(x) be distribution functions of integer-valued
random variables, fit) and git) be their characteristic functions. Then

I fit)-git)
sup \Fix) - G(x)\ < \ dt.
x 4 J-n

Theorems 1.4.9 and 1.4.10 impose stringent conditions on the behavior of


the characteristic functions in the neighborhood of the origin ((/(f) git))/t
must be integrable), and therefore they cannot be used, for example, when
the tails of the distributions are of 'different weights'. The following theorem
provides us with an estimate which is free of this disadvantage.
Let X be a random variable with distribution function Fix). The concen-
tration function of X (or of F) is defined as

Qxil) = QiF; I) = sup ( < X < a +1).


a
32 1. Basic properties

THEOREM 1.4.11. Let Fix) and Gix) be two distribution functions with charac-
teristic functions f{t) and g{t) respectively. IfGix) is absolutely continuous, and
the corresponding density qix) is bounded:

supgCt) < a < oo,


X

then for any positive and L, satisfying the condition LT > 2, the inequality

sup |/(i) - log(LT) + ^ + 1 - QiG;L)


sup I Fix) - G(x)| < c |i|<T 1

holds, where c is an absolute constant (c < 60/

Theorem 1.4.11 is a particular case of the following general estimate. Let


fix) be a real-valued function defined on an interval [a, ft] of the real line. The
total variation of fix) on [a, &] is defined as

V(/) = s u p ^ | / C x i ) - / ( * i _ i ) |
i=l
where the supremum is taken over all and all collections XQ,XI, ...,xn such
that a = XQ < x\ < ... < xn = b. The total variation on the whole real line is
defined as
OO X
V i f ) = lim VCf).
oo x->oo X
For Vf?oo(/), we will omit the limits and write V(/).
A function fix) is said to be a function of bounded total variation if V(/) < oo.

THEOREM 1.4.12. Let Fix) be a non-decreasing function, and Gix) be a function


of bounded variation, fit) and git) be their Fourier-Stieltjes transforms:
OO r oo
/ -oo eitxdFix), git)= J/
oo eitxdGix).
Suppose that the following conditions are satisfied:
(1) F{-oo) = G(-oo);

(2) Gix) is differentiable, and |G'Gx:)| < a.


Then for any positive and L satisfying the condition LT > 2 the inequality

sup \Fix) - Gix)\ < c sup |/(f) - g(i)| log iLT) + YiL)
_|i|sr 1

holds, where c is an absolute constant (c < 601), and


oo x+L
y ( L ) = V (G) sup V ( G ) .
-OO X
1.4. Inequalities 33

PROOF. Denote

= sup |/(f) - W(x) = F{x) - G(x), = sup \W(x)\,


|t|<r *

and = 8/. Without loss of generality we may assume that > 16/3,
> 16y(L) and > 16, for otherwise the theorem would be obvious. We show
now that we can choose two points xo < yo so that |*o 3 | + (8 + 1 )/T and
for e fro 4/, + 4/] andy e (yo 4 /, y + 4/] we obtain

|WCx) - W(y)| > /4. (1.4.17)

Let the pointsxi andyi be chosen so that + yi = L + 1 IT and


oo
V (G) - V(G) < y(L).
OO Xl

There are two possibilities:

(1) W(x) > /4 a n d - W ( x ) > /4;

(2) at least one of these requirements is violated.

In the first case, as xq andyo we may take \+4/ andyi4/. Since F(x)
is a non-decreasing function and by condition 2 of the theorem the increment of
G(x) on intervals of length 8 / does not exceed , condition (1.4.17) is satisfied
in this case.
In the second case we assume that W(xi) < /4. For < x\ we get | W0c)| <
/4 + y(L). On the other hand, for y > yj we obtain

|W(y)| < |F(yi) - G(y)| + \F(oo) - G(y)|


< |W(yi)| + IG(y) - G(yi)| + |F(oo) - G(oo)| + |G(oo) - G(y)|
<|W(yi)| + 2y(L) + e .

Therefore, there exists a point such that < < yi, and | > 3/4. Set
^o = - 4/, and takeyo =z + 4/ for W{z) > 0 and y^z - 4/ for W{z) < 0.
From the monotonicity of F(x) and the boundedness of the derivative of G(x) it
follows that (1.4.17) holds. The case W(yi) < /4 can be handled with obvious
modifications.
Now consider two auxiliary functions:

. , 3 f sin(xT/4)\ 4 , . f itx( _
= 8 \ ic7V4 / ' m = L e HiX)dX

It is easy to see that for these functions we have


oo roo 1

/
H{x)dx = 1, 2/ H(x)dx<, (1.4.18)
-oo|()| < 1, 0 for || > T.8 0
h(t) = Jin/T (1.4.19)
34 1. Basic properties

Now apply the inversion formula to the characteristic functions of the distri-
butions OO

/ -ooF{x-y)H(y)dy
and oo
/ -oo G(x y)H(y) dy.
We obtain
/ = Fi(x0) - GM - Fiiyo) + Gi(y0)
e-ityo _ -
[f(t)-g(t)]h(t)dt.
2./_:- it
From the choice of xo andyo. (1-4.17) and (1.4.18) it follows that
rin/T
4/

> Ia. (1.4.20)


/ -4/mxQ-z)-W(yQ-z)]H{z)dz 80 5

On the other hand, from (1.4.19) and the fact that |*o Jo| < L + (8 + 1 )/T we
obtain
1 fT |sin[i(*o -yo)/2]
\f(t)-g(t)\dt<c2elog(LT)

2 rT(xo-yo)/2 sini
<- dt
Jo t
2 f(LT+8+1)/2
2 r sini
<- / dt
Jo
2
(Lr+8n+l)/2 sini

dt

< - [1 - 12 + ln(LT + 8 + 1)1


< In(LT), (1.4.21)



By comparing (1.4.20) and (1.4.21), we obtain the assertion of the theorem.

The problem of estimating the Levy distance was studied in (Zolotarev,


1970; Zolotarev, 1971; Zolotarev & Senatov, 1975). Some of their results are
below.
Let F(x) and G(x) be two distribution functions. The Levy distance L(F, G)
between F and G is defined as the infimum of all positive h such that G(x) is
between F(x h) h and F{x + h) + h for all x, i.e.,
L(F, G) = inf {h: F(x h) h< G(x) < F(x + h) + h, V i e ( - o o , oo)}.
1.4. Inequalities 35

THEOREM 1.4.13. Let Fix) and G(x) be two distribution functions with charac-
teristic functions fit) and git) respectively. Then, for any > 1.3, the inequality
T
log
- Jof
\ fit) -git)
LiF,G)< dt + 2e

holds.

As shown in (Zaitsev, 1987), the logarithmic factor logT on the right-


hand side of this inequality cannot be removed. (Actually, the same was
observed by V.A. Popov in 1985 (Popov, 1987) in his communications at the 9th
International Seminar on Stability problems for stochastic models in Varna,
May 1985.Eds.)
In the space of characteristic functions the following metric is often useful:

X(f,g) = min max j ^ max |/() - ^ j .

Among a number of known relations connecting the metrics Li, ) and (, ),


we point out the following one.

THEOREM 1.4.14. Let X and Y be random variables with distribution functions


Fix) and G(x) and characteristic functions fit) and git) respectively. Suppose
that E|Y| r < oo for some r > 0. Then

Xif,g) < 12yr[L(F, G)]r/(2r+1)

and
LiF, G) < 8 ( l + In yr + (1 + 1/r) In W,g),

where = minaiElY |)1/.

We also point out an estimate for the uniform distance between proba-
bility densities pix) and qix) via the Li-distance between the corresponding
characteristic functions fit) and git):

1 f
sup |p(x) - g(*)| < / | / - g(t)\dt,
x J-oo

which is just a trivial consequence of the inversion theorem for densities (The-
orem 1.2.6); its discrete analog is: if X and are integer-valued random
variables with characteristic functions fit) and git), then

sup \PiX = k) PiY = k)\<- I fit) - git)\dt;


k in J-
36 1. Basic properties

and recall the Parseval-Plancherel identity in the form of Theorem 1.2.12


which allows to estimate the L2 -closeness of probability densities by the L2-
closeness of characteristic functions and vice versa.
We also present some results concerning estimating Li-distance between
distribution functions in terms of characteristic functions.

Let Fix) and G(x) be distribution functions with characteris-


THEOREM 1 . 4 . 1 5 .
tic functions fit) and git). Then for any > 0,

The proof of the theorem can be found in (Ibragimov & Linnik, 1971).
If characteristic functions of two distributions coincide on some interval
containing the origin, they are not necessarily the same (see Appendix A,
Examples 11 and 12). However, this implies that the distribution functions
must be close (the wider the interval, the closer). More exactly, the following
estimate holds.

Let F(x) and G(x) be distribution functions with characteris-


THEOREM 1 . 4 . 1 6 .
tic functions fit) and git). If fit) = git) for || < T, then

The proof of the theorem is contained in (Esseen, 1945). It turns out that
Theorem 1.4.16 is sharp, as we see from the assertion below.

PROPOSITION 1.4.1. The relation

F,G J-00

is true, where the supremum is taken over the set of all pairs of distribution
functions whose characteristic functions coincide on the interval [-T, T\.

PROOF. Consider the sequence of probability densities p\ix),p2ix),... of the


form

where the constants c > 0 are chosen so as to make pnix) a density function
for each n. For each n, let fnit) be the characteristic function corresponding
1.4. Inequalities 37

to pn(x). Then fi(t) vanishes outside the interval [1,1] (Feller, 1971, p.501),
and if we write cos* = (e" + e - l x )/2, it is easily seen that fn(t) vanishes outside
[n,n]. Continuing fn(t) periodically with the period > 2, we obtain the
characteristic function fn,x(t) of an arithmetic distribution (Feller, 1971), with
atoms a t * = 2 nk/, k = 0,1,2,..., of sizes

Fn^x) ~ FnJi(x - 0) = () (1-4.22)

(Feller, 1971), where F ni x(x) is the distribution function corresponding to the


characteristic function fn^(t). Moreover, the characteristic functions fn^(t)
and / ,+() coincide on the interval [, T\, where

= (,) = - . (1.4.23)

To prove the proposition, it suffices to show that

oo
/ -oo ^) - FnMiW\d* * (1.4.24)

Set Ik = /*() = [2n(k - 1)/,2]. Then

oo

/ -oo \Fn^x)-FnMl{x)\dx= . / \Fn,x{x)-FnM1{x)\dx.


Jit (1.4.25)

The distribution function F n ^(x) is constant within each interval Ik while


F nt i+i(x) has at least one jump there, say at Xk. Denote the fractional part of
by {} ({} = fct]where [x] is the maximum integer no greater than x). We
have
fxk^ 1 = (xktt +

1 2 J ~ 1 2 2 J 1 2 J \ 2 J '

so Xk divides Ik into two intervals of lengths

?H}) - ?{}
Hence by (1.4.22) and (1.4.25), with
38 1. Basic properties

we obtain
poo
oo
/ -oo \Fn,x(x) - FnM1(x)\dx

* h {rh i 1 - {})}
k=oo --0,1
+ min

2 00
> h(xk)[FnM1(xk) - FnM1(xk - 0)]

*
(2)2 00

) P<*W*) (1.4.26)
k=oo
From (1.4.23) and (1.4.26) we obtain
oo
/
- FnjL+1(x)\dx
-oo
oo pit
/
pn(x)h(x)dx = In / (), (1.4.27)
-oo Jo
where
oo
() =k=oo
(pn(2nk + oo x)+ pn(Ink - x))
= 2c(l - cos (2nk + x)~2,
1
0<<.
k=oo
We denote the distribution function corresponding to the density () by
(). Let us prove that
()->0 as n - > oo (1.4.28)
for each e [0, ). Let us choose any y e (, ). Then
(1-cosu)" (-)
1> ()> 2c- 2 du > 2c n (l cosy) -
JJyy JJyy 11 71
therefore
c fx
() = / ()<1 = / (1 cosu) n-1 i^i(u)iiu
yo ci 7o
< (1 cosx)" - 1 / l//l(u)du
Ci Jo
^ ( 1 - cos :)"-1
< > 0,
2( - y ) ( l - cosy)"
1.5. Moments, expansions, asymptotics 39

i.e., (1.4.28) is proved. Relation (1.4.28) means that weakly converges to


the degenerate distribution in . This implies

2 / {){) -> 2() = .


Jo
Combining this relation with (1.4.27), we obtain (1.4.24).

1.5. Characteristic functions and moments,


expansions of characteristic functions,
asymptotic behavior
Let Fix) be a distribution function and let > 0 be a positive real number (not
necessarily integer). The absolute moment of order of F(x) is defined as

provided that the integral on the right-hand side exists and is finite. In this
case the moment of order

exists also.
In this section, we present some results concerning the relationship be-
tween the existence of moments of a distribution function and the behavior of
its characteristic function in the neighbourhood of the origin.

THEOREM 1.5.1. If a characteristic function fit) is times differentiable at zero,


then
(a) all moments of order k < exist if is even,

(b) all moments of order k < 1 exist if is odd.

For odd n, the condition k < 1 cannot be weakened (see Example 24 of


Appendix A).

COROLLARY 1.5.1. If a characteristic function fit) is infinitely differentiable at


zero, then all moments of the corresponding distribution exist.

THEOREM 1.5.2. Let Fix) be a distribution function with characteristic function


fit). If the n-th moment of Fix) exists, then fit) is times differentiable for all
t, and
40 1. Basic properties

If the moment of order exists, then it related to the rath derivative of the
characteristic function by the formula
an = (-i)n/n}(0).

The existence of some moments of a distribution function guarantees the


existence of the corresponding derivatives of the characteristic function, there-
fore, for the characteristic function an expansion can be derived.

THEOREM 1.5.3. Let Fix) be a distribution function with characteristic function


fit). If the first moments of Fix) exist, then f(t) admits an expansion of the
form

kl
fei
as |i| 0.
Conversely, if fit) admits an expansion of the form

/(i) = l + a * i * + o( |*D, |f|-o,


k=l
then each moment of Fix) of order k exists for k < if is even and k < 1 if
is odd. In this case
iiak)k

If a distribution function Fix) has a moment of order + , where 0 < < 1,


then an expansion for fit) can be derived in the following form, which differs
from that given by Theorem 1.5.3 in the remainder term.

THEOREM 1.5.4. Let Fix) be a distribution function with characteristic function


fit). If its moment of order + exists, 0 < < 1, then fit) admits an expansion
of the form
fit) = 1 + + 0(||+5)
k]
.
as || 0.
Conversely, if fit) admits an expansion of the form

k=l
then Fix) has all moments of orders inferior to + . In this case

iiakf
1.5. Moments, expansions, asymptotics 41

Since the existence of the absolute moment ( > 0) of a distribution


function Fix) implies

1 - Fix) + F(-x) = (~), oo, (1.5.1)

and, conversely, (1.5.1) implies that exists for all < , the following
theorem holds.

THEOREM 1.5.5. Let Fix) be a distribution function with characteristic function


fit). Then fit) admits an expansion of the form

A=1 k

as |i| 0, where is a positive integer, and 0 < < 1, if and only if

1 - Fix) + Fi-x) = oix-(n+5)), oo.

The accuracy of the approximation of a characteristic function by the first


terms of its Taylor expansion is given by the estimate

n+i
fit+s) - fit) - ^fa) -... - ^/^ n+1
1! nl in + 1)!

which is true for all t and s.

THEOREM 1.5.6. Let Fix) be a distribution function with characteristic function


fit). Suppose that f(2n~1)(t), where is positive integer, exists for all t. If there
exists > 0 such that the function

(1.5.2)
t

is bounded for 0 < |f| < , then the moment of order In of Fix) exists.
Conversely, if the moment of order 2 of Fix) exists, then function (1.5.2) is
bounded on the set i, ) \ { 0 } for some positive .

THEOREM 1.5.7. Let Fix) be a distribution function with characteristic function


fit). Suppose that there exists a sequence t\, t2,... and a constant , 0 < < 2,
such that

(a) lim^oo tn = 0;

(b) the series ^ || converges for any > 0;

(c) the sequence {tn-iltn} is bounded;


42 1. Basic properties

(d) \og\fitn)\l\tn\x is bounded.


Then Fix) has moments of any order inferior to .

Condition (d) can be replaced with the condition that (1 |/(ira)|V|in|A is


bounded or with the condition that (1 91/(/))/|| is bounded.

Let Fix) be a distribution function with characteristic function


THEOREM 1 . 5 . 8 .
fit). Suppose that a moment of order 2n, where is a non-negative integer,
exists. Then a moment of order 2n + , 0 < < 2, exists if and only if the
integral

exists and is finite for some c > 0.

Let Fix) be a distribution function with characteristic function


THEOREM 1 . 5 . 9 .
fit). Suppose that a moment of order 2n, where is a non-negative integer,
exists. Denote uit) = 3\fit). Then

1 71,5
ft. = "TV1
^ sin
) ^ " ( 2 " ) (^0 ) ~ dt.
+ T

Some other results, concerning fractional moments and fractional deriva-


tives of the characteristic function, can be found in (Hsu, 1951; Zolotarev,
1957; Ramachandran & Rao, 1968; Ramachandran, 1969; Brown, 1972; Ka-
gan et al., 1973; Lukacs, 1983).

Let Fix) be a distribution function with characteristic func-


THEOREM 1 . 5 . 1 0 .
tion fit) and 0 < a < 2. Then

1 - Fix) + Fi-:c) = ( j ^ ) - |x| oo

if and only if
1 - 91f(t) = 0(||), \t\ 0.

This theorem can be formulated in the following equivalent form.

THEOREM 1 . 5 . 1 1 . Let Fix) be a distribution function with characteristic func-


tion fit) and 0 < a < 2. Then

1 - Fix) + Fi-x) = ( - j ^ ) , |x| - oo

if and only if
1 - |/()| = 0(|| ), \t\ 0.
1.6. Unimodality 43

In Theorem 1.5.10, as well as in Theorem 1.5.11, can be replaced with


simultaneously in both relations (for the distribution function and for the
characteristic function).

1.6. Unimodality
The concept of unimodality was introduced in (Khintchine, 1938). A random
variable X and its distribution function F(x) are called unimodal with the mode
at a if F(x) is convex on (oo, a) and concave on (a, oo). Unimodal distributions
play an important role in many branches of probability theory and statistics.
This section brings together some of the basic results concerning characteristic
functions of unimodal distributions.
For simplicity, we will consider below only unimodal distributions with the
zero mode. Possible extentions to the general case are evident.
In (Khintchine, 1938), the following criterion for characteristic functions
of unimodal distributions was proved.

THEOREM 1.6.1. Let cp(t) be an arbitrary characteristic function, then the func-
tion

(1.6.1)

is a characteristic function of a unimodal (with mode at zero) distribution.


Conversely, any characteristic function of a unimodal (with mode at zero) dis-
tribution can be represented in the form (1.6.1), where (p{u) is a characteristic
function.

This criterion can be reformulated as follows.

THEOREM 1.6.2. Let G(x) be an arbitrary distribution function. Then the func-
tion

(1.6.2)

is the characteristic function of a unimodal (with mode at zero) distribution.


Conversely, any characteristic function of unimodal (with mode at zero)
distribution can be represented in the form (1.6.2), where G(u) is a distribution
function.

PROOF. Let a function f(t) be represented in the form (1.6.2) where G(u) is a
distribution function. Taking into account that

sin(uf)
(") = e
ut
44 1. Basic properties

is the characteristic function of the uniform on [0,21] distribution, and using


Theorem 1.2.9, we derive
roo I f2t ft
fit) = / Wt(u)dG(u)= / g(u)du = / <piu)du
J OO ./ t i o

where ^(i) is the characteristic function corresponding to the distribution func-


tion G(x) and <p(u) = g(2u).
Conversely, let fit) be the characteristic function of a unimodal (with mode
at zero) distribution. Then (1.6.1) holds true, where () is a characteristic
function. Denote the distribution function corresponding to <p(u) by (:). Then,
by virtue of Theorem 1.2.9,

,>=! , , , , ^ ^ )
t Jo J-oo (ut/2)

= Joo
^ute ^ G ( u ) ,
where G(u) = (2).

COROLLARY 1.6.1. Let G(x) be an arbitrary distribution function. Then the


function
sin(ui)
/(f) = / --LdG{u) (1.6.3)
J-OO ut
is the characteristic function of a unimodal distribution symmetric about zero.
Conversely, the characteristic function of any unimodal distribution sym-
metric about zero can be represented in the form (1.6.3), where G(u) is a distri-
bution function.

Theorem 1.6.1 implies the following necessary condition for characteristic


functions of unimodal distributions.

THEOREM 1.6.3. Let f(t) be the characteristic function of a unimodal distribu-


tion. Then fit) is differentiable for (at least) all t * 0.

This assertion immediately follows from representation (1.6.1).


We point out one more property of unimodal distributions (and their char-
acteristic functions) which will be frequently used in Chapter 2.

THEOREM 1.6.4. Let fit) be the characteristic function of a unimodal distribu-


tion. Then the distribution corresponding to |/(f)|2 is also unimodal.

In terms of random variables, this property can be formulated as follows: if


X\ and X-2 are independent random variables distributed by one and the same
1.6. Unimodality 45

unimodal law, then the distribution of-Xi Xi is also unimodal. Note that this
is not true for the sum: f2it) does not necessarily corresponds to a unimodal
distribution if fit) does (see Appendix A, Example 30).
The necessary and sufficient conditions given by Theorems 1.6.1 and 1.6.2
are often hard to verify for a given function; therefore useful sufficient condi-
tions are of interest. One of these conditions is given below.

THEOREM 1.6.5. Let fit) be a real-valued, continuous function differentiable for


t > 0 satisfying the conditions

(1) f(0) = 1;

(2) fit) = fi-t);

(3) lim^oo f{t) = 0;

(4) fit) is convex for t > 0.


Then fit) is the characteristic function of a unimodal distribution.

PROOF. We prove the theorem under additional conditions that fit) is three
times differentiable and f i t ) is non-negative for t > 0,
/OO

/ t\fit)\dt < oo,


Jo
lim t2fit) = lim t3fit) = lim t4f"(t) = 0.
- f->oo ttoo
A complete (without any additional restrictions) proof is contained in (Askey,
1975).
It suffices to prove that the function
oo r oo
/
fit)eitxdt = 2 1 fit) cositx) dt
is unimodal. This will be -00
proved if we showJothat p'ix) < 0 for > 0. But
POO

p'ix) = -2 tfit)sinitx)dt
Jo

so we need to show that


roc
Iix)= / tfit) sin(fct) dt >0
Jo

for t > 0. Integrate by parts, differentiating fit) and integrating the other
factor. Then
rt 00 r oo pt
Hx) = fit) / u sin(ux) du fit) / u sin(ujc) du dt,
Jo Jo Jo
46 1. Basic properties

and the integrated terms vanish by the assumptions we made. Integrate by


parts twice more; again the integrated terms vanish to obtain
poo pt
I{x) = fit) uit - u)sm{ux)dudt
Jo Jo
2 pOO pt
= - - fit) u{t - u)2sin{ux)dudt.
2 Jo Jo

Since fit) > 0 for t > 0, it is sufficient to prove that

git) = u(t u)2 s i n ( u x ) d u


Jo
for t > 0 and > 0. Setting ux = v, we obtain
1 rtx
g(t) = 7 {fx v)2v sin dv,
X4 Jo
so it is sufficient to prove that

h(t) = [ it - v)2v sin vdv > 0


Jo
for t > 0. Denote by H{z) the Laplace transform of hit). We have
poo pt poo poo
Hiz)= e~tz it-v)2vsinvdvdt= e~tzt2 dt e~z"vsinvdv
Jo Jo Jo Jo
) 1
e~iz-i)vvdv = s2-
zd Jo z3 iz-i)2 z2{z2 +1)2'
But
1 fc
=_ / e~ztil - cos t) dt;
ziz2 + 1) Jo

therefore
it v)2v sin vdv = 4 / (1 cosy)(l cos(f v))dv,
Jo Jo
and the right-hand side is obviously positive when t > 0.

The notion of unimodality has proved to be very fruitful, therefore its


analog has been introduced for discrete distributions. A random variable X
taking values a+nh, = 0, 1, 2,..., with probabilities pn and the correspond-
ing distribution function F{x) are called discrete unimodal with the mode at
ao = a + noh i f p n > pn-\ for n<no a n d p n < pn-\ for n>no + l.
For simplicity, we will consider below only integer-valued discrete unimodal
random variables (and corresponding distributions) with the zero mode and,
respectively, use the term 'discrete unimodal' only for these distributions.
For characteristic functions of discrete unimodal distributions, the follow-
ing analog of the Khintchine criterion was obtained in (Medgyessy, 1972).
1.6. Unimodality 47

THEOREM 1.6.6. The characteristic function fit) of an integer valued random


variable is the characteristic function of a discrete unimodal distribution (with
mode at zero) if and only if it can be represented in the form

jet rt
=^ T TT / (1.6.4)
(1 - elt) Jo

where (0 < < 1) is arbitrary and cp(u) is the characteristic function of some
discrete distribution.

There is a difference between the Khintchine criterion and the Medgyessy


one. The first of them gives necessary and sufficient conditions for {) under
which it generates the characteristic function of a unimodal distribution. The
Medgyessy criterion gives only a necessary condition for () (it must be the
characteristic function of a discrete distribution). This condition is not suffi-
cient: there exist discrete distributions such that the right hand side of (1.6.4)
is not a characteristic function at all (but, as it follows from the Medguessy
criterion, if the right-hand side of (1.6.4) is a characteristic function, then it
is the characteristic function of a discrete unimodal distribution). Thus, the
Medguessy criterion gives conditions for fit) to be the characteristic function of
a discrete unimodal distribution under the condition that it is a characteristic
function.
Below we present two criteria of the Khintchine type.

THEOREM 1.6.7. Let (pit) be a characteristic function of a random variable Y


taking values k + 1/2, k = 0, 1, 2 , . . . , and satisfying the following condition

"(y ' J L m ) - (1.6.5)


^ J=k 7 +J 1/2
k=oo
Then the function

1 f*
f(t) = r (p(u)du, f(2nn) = 1, = 0,1,2,..., (1.6.6)
2 sin Jo

is the characteristic function of a discrete unimodal (with mode at zero) distri-


bution.
Conversely, the characteristic function of any discrete unimodal (with mode
at zero) distribution can be represented in the form (1.6.6), where q>(u) is
the characteristic function of a random variable Y taking values k + 1/2,
k = 0,1,2,..., and satisfying condition (1.6.5).

PROOF. Let Y be a random variable taking values k + 1/2, k = 0, 1 and


satisfying condition (1.6.5). Denote its distribution function and characteristic
48 1. Basic properties

function by () and () respectively. We set

j=k J j=-oo

Consider the distribution function G(x) such that

Gix) = pkx + bk (1.6.7)

for k 1/2 < < k + 1/2, k = 0, 1, 2,... Denote its characteristic function by
git):
oo
/ -oo eitxdG(x).
Then, as we easily see, G(x) is a continuous distribution function such that

GGc) - xG'(x) = ()

(the forward derivative is taken at the points - k + 1/2). Then, due (Khint-
chine, 1938), G(x) is a unimodal distribution function (with mode at zero),
and

g(t) = -f\(u)du. (1.6.8)


t Jo

Further, (1.6.7) implies that


oo
/ -oo H(x-y)dF(y),
where

+1/2, - 1 / 2 < x <1/2,


H(x) = 0, ; < -1/2, (1.6.9)
1 x > 1/2,

and F(x) is the distribution function of an integer-valued random variable.


Therefore
, sin(i/2) x
= t/2
where oo

/ -oo eitxdFix).
Taking (1.6.8) into account, we obtain

1 [*
f = / <P(u)du, f(2tm) = 1, = 0 , 1 , 2 , . . .
2 sin(i/2) Jo
1.6. Unimodality 49

Conversely, let f(t) be the characteristic function of an integer-valued, dis-


crete unimodal random variable. Then the function

x sin(/2)
=
t/2

is the characteristic function of a random variable with the distribution func-


tion oo

/ -oo H(x-y)dF(y),
where H(x) is defined by formula (1.6.9), and F(x) is the distribution function
corresponding to the characteristic function f(t), i.e., G(x) is a continuous and
linear on each interval [k 1/2, k + 1/2] distribution function such t h a t it is
convex for < 0 and concave for > 0. In other words, on each interval
[k1/2, k+1/2), k = 01, 2,..., the distribution function G(x) can be represented
in the form
G(x) = akx + bk

where * > 0, > 0, * increases for k < 0 and decreases for k>0 ,bk increases
for all k. Therefore (see (Khintchine, 1938))

G(x) - xG'ix) = () (1.6.10)

and
1 *
g(t) = - q>{u)du,
t Jo
where <J>0c) is some distribution function and cp(t) is the corresponding charac-
teristic function.
From (1.6.10) we obtain (*) = bk for * e (k - 1/2,k + 1/2), k = 0,1,2,...
This means t h a t is a discrete distribution function having jumps bk+1 bk
at points k + 1/2, k = 0, 1,2,..., i.e.,

P(.Y = k + l/2) = bk+1-bk.

From the continuity of G(x) we obtain

ak(k + 1/2) + bk = ak+i(k + 1/2) + bk+1

or

2(bk+1 - bk) 2 P ( Y = k + 1/2)


= = ( L 6
- 2^ + 1 2* + l * -n)
But
ak = P(k - 1/2 <X< k + 1/2)
50 1. Basic properties

where X is a random variable with the distribution function G(x), therefore,


summing (1.6.11) for all k beginning with some k = i, we obtain

P(y = k + V2)
at - P(i 1/2 <X <i + 1/2) = 2 f ;
k=l 2k+ 1

Hence
1= P(i - 1/2 <X< i + 1/2) = ^ ah

where
~ P(Y = k + l/2)
a, = 2} ,
f-i 2A: + 1

which coincides with (1.6.5).

Now consider the set of all distributions with densities of the form

p(x)=X' 0<x<k,k = l,2,...,


|, > k, < 0,

or of the form

[, < k, x>0.

Denote this set by si'. Denote the set of all mixtures of distributions from s i
by

THEOREM 1.6.8. Let () be the characteristic function of a distribution from


SB. Then the function

W = ^ /%(")<*, /(2) = 1, = 0,1,2,... (1.6.12)


2(1 - cosi) Jo

is the characteristic function of discrete unimodal (with mode at zero) distribu-


tion.
Conversely, the characteristic function of any discrete unimodal (with mode
at zero) distribution can be represented in the form (1.6.12) where q>(u) is the
characteristic function of a distribution from 38.

Proof of this theorem is similar to that of Theorem 1.6.7. Theorem 1.6.7


is based on the following characterization of discrete unimodal distributions
(Navard et al., 1993). Denote the uniform distribution function over the unit
interval with the center at a by Ua.
1.6. Unimodality 51

(i) The distribution F of an integer-valued random variable is discrete uni-


modal (with an arbitrary mode) if and only if F * Ua is unimodal (o can
be taken arbitrary).
It turns out that the same characterization holds if the uniform distribution
is replaced with the triangle distribution (Ushakov, 1998). More exactly, let Ta
be the triangle distribution over the interval [ 1, a + 1], i.e., the distribution
with the density
x a + 1, a l<x<a,
qa(x) = * + a + 1, a <x<a + 1,
,0, |-|>1.
Then
(ii) the distribution F of an integer-valued random variable is discrete uni-
modal if and only i f F * Ta is unimodal.
Theorem 1.6.8 is based on this characterization.
In terms of characteristic functions these two characterizations, (i) and (ii),
can be formulated as follows:
(1) a characteristic function fit) is the characteristic function of a discrete
unimodal distribution if and only if f(t)sm^2) elta (a arbitrary) is the char-
acteristic function of a unimodal distribution;

(2) a characteristic function fit) is the characteristic function of a discrete


unimodal distribution if and only if fit) eita (a arbitrary) is the
characteristic function of a unimodal distribution.
The following question remains to be answered. Do other distributions exist,
different from uniform and triangle, which this characterization holds for? In
other words, does there exist a characteristic function git), which cannot be
represented as or eibt, such that a characteristic function
fit) is the characteristic function of a discrete unimodal distribution if and
only if f(t)g(t)eiat (a arbitrary) is the characteristic function of a unimodal
distribution?
Note that, unlike Theorem 1.6.7, the characteristic function () in Theo-
rem 1.6.8 'generating" the characteristic function of a discrete unimodal distri-
bution is the characteristic function of an absolutely continuous distribution.
Using Theorems 1.6.6, 1.6.7, and 1.6.8, we obtain the following necessary
condition for characteristic functions of discrete unimodal distributions similar
to Theorem 1.6.3 in the continuous case.

THEOREM 1.6.9. Let fit) be the characteristic function of an integer-valued


unimodal distribution. Then fit) is differentiable for (at least) all t Inn,
= 0,1,2,...
52 1. Basic properties

This necessary condition can be applied, in particular, for analysis of uni-


modality of high convolutions (Ushakov, 1982; Dharmadhikari & Joag-Dev,
1988).

1.7. Analyticity of characteristic functions


A characteristic function fit) is said to be an analytic characteristic function if
there exists a function () of the complex variable = t + iy it, y real), which
is analytic in a circle |z| < R for some R > 0, and such that <p(t) = fit) for
I < R. In view of the uniqueness theorem for the analytic continuation, the
function () is uniquely determined by fit); therefore we will use for it the
same notation: f(z).
Analytic characteristic functions play an important role in the theory of
decomposition of probability distributions as well as in some other fields of
probability theory and statistics. Theory of analytic characteristic functions
has been deeply developed during a few last decades. In this section, we
present only some basic results, concerning the analyticity of characteristic
functions, mainly those, which will be used in this book, for instance, for
proving consistence of some estimators and tests based on the empirical char-
acteristic function. More information about analytic characteristic functions
can be found in (Ramachandran, 1967; Lukacs, 1970; Linnik & Ostrovskii,
1977; Lukacs, 1983; Ostrovskii, 1986).

THEOREM 1.7.1. Let Fix) be a distribution function with characteristic function


fit). If

f erMdF(x) < oo (1.7.1)


J oo

for all 0 < r < R, R > 0, then f(z) is an analytic characteristic function in the
circle \z\ < R.
Conversely, if fiz) is an analytic characteristic function in a circle \z\ < R,
then F(x) satisfies condition (1.7.1) for all 0 < r < R.

This condition can be also formulated as follows.

THEOREM 1.7.2. Let F{x) be a distribution function with characteristic function


fit). If

1 - F{x) + F(-x) = 0(e - r i ), oo, (1.7.2)

for all 0 < r < R, R > 0, then f(z) is an analytic characteristic function in the
circle \z\ <R.
Conversely, if f(z) is an analytic characteristic function in a circle \z\ < R,
then F{x) satisfies condition (1.7.2) for all0<r <R.
1.7. Analyticity of characteristic functions 53

THEOREM 1 . 7 . 3 . Let Fix) be a distribution function with characteristic function


f(t). If fit) is an analytic characteristic function, then there exists a strip
a < 3z < b, a > 0, b > 0, such that fiz) is analytic in this strip and can be
represented there as

THEOREM 1.7.4. Let Fix) be a distribution function with characteristic function


fit). Then fiz) is analytic in the strip a < 3z < b, a > 0, b > 0, if and only if

1 - Fix) = Oie'), -> oo,

for allO <r < a, and


Fi-x) = 0(e _ n c ), x oo,
for all 0 < r < b.

COROLLARY 1.7.1. Let Fix) be a distribution function with characteristic func-


tion fit). If Fix) satisfies (1.7.2) for all 0 < r < R, R > 0, then fiz) is analytic in
a strip containing the strip |3z| < R.

Note that the converse follows from Theorem 1.7.2.


Assume that <p(z) is an analytic function in some strip a < 3z < b, a > 0,
b > 0. It is said to be a ridge function if |( + iy)\ < |<p(iy)| for any a <y < b
and all real t.

THEOREM 1.7.5. If a characteristic function fiz) is analytic in some strip a <


3z < b, a > 0, b > 0, then it is a ridge function in this strip.

The maximal strip of the form a < 3z < b, a > 0, b > 0, where a
characteristic function fiz) is analytic is said to be the strip of analyticity
of fiz). The strip of analyticity of an analytic characteristic function can be
the whole complex plane or a half-plane or can have one or two horizontal
boundary lines. If the strip has a boundary, then the points of intersection
of the boundary with the imaginary axis are singular points of the analytic
characteristic function. If the strip of analyticity is the whole plane, then the
characteristic function is said to be an entire characteristic function.
Let fit), fiit), and fiit) be characteristic functions such that fit) = fiit)f2it).
If fit) is an analytic characteristic function, then both f\(t) and fiit) are analytic
characteristic functions. If fiz) is analytic in a strip < 3z < b, a > 0, b > 0,
then /i(z) and f?,iz) are analytic in this strip. If fiz) is an entire characteristic
function, then /i(i) and fiit) are entire characteristic functions.
The methods of the theory of analytic functions have proved to be a very
powerful mean in the investigation of characteristic functions. One of the most
celebrated examples is the following Cramer theorem on the decomposition of
the normal law.
54 1. Basic properties

THEOREM 1.7.6 (Cramer). The characteristic function of the normal distribu-


tion has only normal factors.
In other words, let fit), fi(t) and fzit) be characteristic functions such that
fit) = fi(t)f2(t) If fit) = { 2 ^ 2 ^} is the characteristic function of the
normal distribution, then each fjit) is the characteristic function of the normal
distribution: fjit) = { aft212}, j = 1,2, and + 2 = , of + of = 2.

The next theorem is among those results of this section which will be used
below.

THEOREM 1.7.7. If a characteristic function fit) coincides with an analytic char-


acteristic function git) in some (real) neighborhood of the origin, then they
coincide for all real t: fit) = git).

In particular, this implies that if f(t) = e~ct2 in some neighborhood of the


origin, then fit) = e~ct . Note that this is not true if the interval where fit)
and e~~ct coincide does not contain the origin (see Appendix A, Example 33).

1.8. Multivariate characteristic functions


We make use of the following notational conventions: vectors in R m , m > 1,
are denoted by bold lowercase letters, random vectors are denoted by bold
capital letters, the scalar product of two vectors and y is denoted by (x, y),
the Euclidean norm of is denoted by ||x||: ||x|| = y/(x, x).
Let X = ( Xm) be an m-dimensional random vector with distribution
function Fin), = (xi xm) e R m . The characteristic function fit) of the
distribution function Fix.) (or the random vector X) is defined as

/(t)= [ e i ^ d F i x ) = Eei<txl
J Rm
If .F(x) is absolutely continuous with density p(x), then

/(t)= [ e'^p(x)dx.
J R">
If X is discrete with P(X = ) = pn, = 1,2 ^ = 1, then
00
/et) = 5Dpe i<tA> .
71=1

Many basic properties of multivariate characteristic functions are the same


as or very similar to the properties of one-dimensional characteristic functions.
A multivariate characteristic function uniquely determines the corresponding
distribution. Any multivariate characteristic function fit) satisfies the condi-
tions
1.8. Multivariate characteristic functions 55

(a) fit) is uniformly continuous;

(b) f(0) = f(0 0) = 1;

(c) |/(t)| < 1;

(d) / ( - 1 ) = fit), where - t = (-ti,..., -tm).

A characteristic function / ( t ) is real if and only if fit) = /(t) for all t e R m .


If X and Y are independent random vectors with characteristic functions fit)
and g(t), respectively, then the characteristic function of X + Y is /(t)^(t).
The convolution theorem in R m is absolutely identical to t h a t in R 1 .

THEOREM 1.8.1 (convolution theorem). A distribution function Fix) (in R"V is


the convolution of two distribution functions F\ix) and ^(x)'

Fix) = f Flix-y)dF2iy)= [ F2(x - y)dFi(y)


J R"> JRm

if and only if fit) = /i(t)/2(t), where fit), fi(t) and f2it) are characteristic
functions of Fix), F\ix), and F2ix) respectively.

Let FI(X),F 2 (X), ... be a sequence of distribution functions in R m . By defi-


nition, this sequence converges weakly to a distribution function Fix) (denoted
w
either L i m ^ o o Fnix) = Fix) or Fnix) > Fix) as oo) if

/ uix)dFnix) / uix)dFix) as oo
J RM J Rm

for any continuous bounded function u(x). The continuity theorem in the
multi-dimensional case is also identical to t h a t in the univariate case.

THEOREM 1.8.2 (continuity theorem). Let F\ix),F2ix),... be a sequence of dis-


tribution functions and /i(t), /2(t), ...be the corresponding sequence of char-
acteristic functions. The sequence F\i7L),F2ix),... converges weakly to some
distribution function Fix) if and only if the sequence /i(t), f2it), . converges at
all points to some function fit) which is continuous at zero. In this case, fit) is
the characteristic function of Fix).

Let Fix) be a distribution function. We will use the same letter to denote
the corresponding distribution. So, for any Borel set c R m ,

FiB) = f dFix).
JB

A Borel set is said to be a continuity set of Fix) if FidB) = 0 where dB is t h e


boundary of B.
56 1. Basic properties

As in the univariate case, various variants of the inversion theorem can


be obtained for multi-dimensional distributions. Some of them are presented
below.

THEOREM 1.8.3. LetF(x) be a distribution function with characteristic function


f(t). If A is a continuity set of Fix.), then

FW = lim ! ( f e-^du) e~^f(t)dt.


-> (2n)m JKm \JA )

In particular, if A is a ball Sa>r with radius r and center at a, then

r / nr\ml2 i^lltll2
= 3 <2^ L (|tjf)
where Jp{z) is the Bessel function of order p.
Let a = (,...,a m ) and b = (&i bm) be two vectors such that aj < bi,
i = 1, ...,m. Denote

[a,b] = { = (xi,...,x m ). ai<Xi< bi,i = 1, ...,m}.

THEOREM 1.8.4. Let Fix) be a distribution function with characteristic function


f(t). If [a,b], at <Xi <bi, i = 1, ...,m, is the continuity set ofF(x), then

1 fTi rTm (e-itkak _ e-itkbk \


= (2
T^T Ti->oo
lim - Tlim /
m^ooJ_Tl

J-Tmt\\ ^k
it J fit)dtx...dtm.

THEOREM 1.8.5. Let fit) be an integrable characteristic function. Then the


corresponding distribution function Fix) is absolutely continuous, its density
function p(x) is bounded and continuous, and

p(x) = - i - / e-^fit)dt.

The Parseval and Parseval-Plancherel theorems are generalized to the


multi-dimensional case straightforwardly.

THEOREM 1.8.6 (Parseval). Let Fix) and G(x) be distribution functions with
characteristic functions fit) and git) respectively. Then

f e - ' ^ ( x ) c i F ( x ) = [ fiy t) dGiy).


JRm J R"

In particular,
f fit) dGit) = [ git)dFit).
J R"> J Rm
1.8. Multivariate characteristic functions 57

THEOREM 1.8.7 (Parseval-Plancherel). Let f(t) be the characteristic function of


an absolutely continuous distribution with density function p(x). Then |/(t)| 2
is integrable if and only ifp2(x) is integrable, and, in this case,

THEOREM 1.8.8 (Parseval-Plancherel). Let p(x) and q>(x) be two probability


density functions with characteristic functions fit) and git) respectively. Then

[ (p(x) - g(x)] 2 dx = [ |/(t) - g(t)\2dt,


J r (Zny J un-
provided that the integrals exist.

The Bochner-Khintchine theorem (necessary and sufficient criterion) in


the multi-dimensional case is similar to that in M1. A complex-valued function
fit) defined on R m is called non-negative definite if it is continuous and the
sum

NN _

j= 1 A=1
is real and non-negative for any positive integer N, any complex > I2>>
and any ti,t2, . . . , t belonging to R m .
THEOREM 1.8.9 (Bochner-Khintchine). A complex-valued function f(t) defined
on Mm is a characteristic function if and only if
(a) f{t) is non-negative definite,
(b) fiO) = 1.

The following relation between the characteristic function of a random


vector and characteristic functions of its projections is very useful: it often
helps to reduce a multi-dimensional problem to an analogous one-dimensional
problem.
Let e be a unit vector. The projection of a random vector X on the vector e
is defined as the random variable Xe = (e, X). Denote characteristic functions
of X and Xe by fit) and feit) respectively. Then

feit) = EeLtX = Eelt{eX) = Eel{teX) = fite). (1.8.1)

In other words, the characteristic function of the projection of a random vector


X on a unit vector e is the cross-section of the characteristic function of X along
the straight line which is determined by the vector e.
Many criteria of Section 1.3 can be easily extended to the multi-dimensional
case. In particular, Theorems 1.3.3, 1.3.4, 1.3.6-1.3.8, 1.3.10, 1.3.15 are true
58 1. Basic properties

in multi-dimensional case with almost the same formulations as in the one-


dimensional case. Their proofs do not differ much from those in the univariate
case. At the same time, Theorem 1.3.9 does not hold in multi-dimensional case
if we mean the following natural multivariate generalization of the Polya class
of univariate characteristic functions: the spherically symmetric continuous
functions satisfying the conditions

(a) /(0) = 1;

(b) for any unit vector e, fe(t) = fite) is convex for t > 0,

(c) limytn^oo fit) = 0.

There are functions in R m satisfying these conditions which are not charac-
teristic functions (see Appendix A, Example 37). Nevertheless, some multi-
dimensional analogs of the Polya criterion were obtained (see, e.g., (Velikoiva-
nenko, 1987; Velikoivanenko, 1992). Unfortunately, many multi-dimensional
analogs lose the main advantage of the Polya criterion, namely its remark-
able simplicity. A nice exclusion is the Askey theorem (Askey, 1973); see also
(Trigub, 1989).

THEOREM 1 . 8 . 1 0 . Let fit) be a real-valued continuous spherically symmetric


function defined on Rm such that

(a) fiO) = 1;

(b) for any unit vector e, feit) = fite) is k = [m/2] times differentiable, where
[] denotes the greatest integer less or equal to, and (1 )kf^it) is convex
for t > 0;

(c) limiitn^oo /(t) = 0.

Then fit) is the characteristic function of a (spherically symmetric) m-dimen-


sional distribution.

Making use of (1.8.1) we immediately obtain multi-dimensional general-


izations of Theorems 1.4.1, 1.4.4, and of the first inequality of Theorem 1.4.5
as direct corollaries to the corresponding one-dimensional results.

THEOREM 1.8.11. For any characteristic function fit) and any positive integer
n,
1 - Kfint) < n { l - PV(t)F} < n2[ 1 - 9t/(t>]

for all t e R m .
1.8. Multivariate characteristic functions 59

THEOREM 1.8.12. Let fit) be a characteristic function such that |/(t)| < c for
||t|| > b, c < 1, b > 0. Then

|/(t)|<l-^||t||2

for ||t||< b.

THEOREM 1.8.13. For any characteristic function /(t)

|3/(t)| < y/1 - Xf(2t).

Note that, generally speaking, the second inequality of Theorem 1.4.5 does
not hold in the multi-dimensional case. Indeed, consider the characteristic
function /(t) = f(t\, = (the corresponding distribution is concentrated
on the y-axis). Let s and t be arbitrary real numbers such that s * 0 and t * 0.
Set s = (s, 0) and t = (0, t). Then

|/(t + s) - /(t)|2 = |/(s, t) - f(0,t)\2 =(l-e~'^2

and
1 ^f(s) = 1 f(s, 0) = 0;
hence
|/(t + s) - /(t)|2 > 2[1 - 9l/(e)].
The following multi-dimensional analog of Theorem 1.4.2 is not a conse-
quence of the one-dimensional result but can be proved in exactly the same
way as in the one-dimensional case.

THEOREM 1.8.14. For any characteristic function fit) and any ti and t%

|/(ti +12)| > |/(ti)||/(t2)| - (1 - |/(ti)|2)1/2(l - |/(t2)|2)1/2.

COROLLARY 1.8.1. Let fit) be a characteristic function. If |/(ti)| > cos and
|/(t2)| > cos 2, where >0,2>0, and (pi + (fc < /2, then

|/(tl + t2)| > C0S(<P1 + (ft).

A multi-dimensional generalization of Theorem 1.4.6 is given in Chapter 2.


Theorem 1.4.8 admits various multi-dimensional analogs. They are also pre-
sented in Chapter 2.
Theorems 1.8.11-1.8.14 can be considered as necessary conditions for mul-
tivariate characteristic functions.
60 1. Basic properties

Inequalities concerning the closeness of multivariate distributions in terms


of the closeness of the corresponding characteristic functions are usually much
more complicated than in the univariate case. Here we point out only an
estimate for the uniform distance between densities which, as in the one-
dimensional case, is a simple consequence of the inversion theorem for densi-
ties (Theorem 1.8.5): if p(x) and g(x) are two probability density functions of
m-dimensional distributions, and /(t) and g(t) are the corresponding charac-
teristic functions, then

Let X = (, ...,Xm) be a random vector in R m with distribution function


F(x). The moments of order k (k is a positive integer) of X are defined as

where k\ + ... +km = k(kj are non-negative integers, j - 1 ,.,.,; some of them
can be equal to zero).

THEOREM 1 . 8 . 1 5 .Let F(x), e M m , be a distribution function with character-


istic function fit). If all moments of order k exist, then all partial derivatives
of f(t) exist and

The theorem implies, in particular, that

i-i)kdkfi t)

THEOREM 1 . 8 . 1 6 .Let Fix), e R m , be a distribution function with character-


istic function fit). If all partial derivatives of order 2k (k is a positive integer)
of f{t) exist, then the moments of order 2k of Fix) exist.

Recall that this result is not true (even in the univariate case) if the order
of the derivatives is not even.

THEOREM 1 . 8 . 1 7 . Let f(t) = fiti, ...,tm) be a characteristic function. If

d2kfi t)
dtf
1.8. Multivariate characteristic functions 61

exists (k\ + ... + km = k; kj are non-negative integers), then the function

d2km
8W ~ dtf\..dt% dt{ki...dt%"

is a characteristic function.

Let X be a random vector in R m with distribution F. The distribution Fe


of the projection (e,X) of X on a unit vector e is said to be the projection of the
distribution F. Let X have the normal distribution, i.e., the distribution with
probability density

p(x) = p(x 1, ...,xm)

1 f 1 m m 1
v vdetC
(2rt)m/2 r a e x p [-2 ^ - ,* - *> j

where a = (, ...,am) is the expectation of X, C = is the covariance


matrix of X, and = is the inverse matrix to C: = C _ 1 . The
characteristic function of X is

( >1
^ m I

ajtj - zr j= 1 k=l aJktJtk Jf


THEOREM 1.8.18. An m-variate probability distribution is normal if and only
if all its univariate projections are normal.
In other words, an m-variate characteristic function fit) is the characteristic
function of the normal distribution if and only if fe(t) are univariate normal
characteristic functions for all unit vectors e.
A distribution F in R m is said to be spherically symmetric with respect to
a point a e R m if it is invariant under rotations about a. Here we present
some results concerning the characteristic functions of spherically symmetric
distributions. A systematic analysis of this class of distributions is given in
(Fang et al., 1989). Without loss of generality it is assumed that a = 0 in the
remainder of the section.
Let X be a random vector in R m whose distribution F is spherically sym-
metric (we will call such vectors spherically symmetric). The distribution F
can be represented as (Schoenberg, 1938)

V^dHir), (1.8.2)
- Jo
f

where U(D
is the uniform distribution on the surface of the m-dimensional
sphere of radius r with center at zero and H(r) is some distribution function.
62 1. Basic properties

Each one-dimensional projection of U*^ (these projections are identical) for


m > 2 is an absolutely continuous distribution with the density

(m 3)/2
iv io\
r(m/2) -
(r
u \x) = rv^r((m - 1 ) / 2 ) ^ r2 ^ ' F * r, (1.8.3)
0, Ixl > r.

It is easy to see that one-dimensional projections of /*r) are symmetric and


unimodal when m > 3. Thus, by virtue of (1.8.2), we obtain the following useful
property of spherically symmetric distributions.

THEOREM 1.8.19. For m > 3, the one-dimensional projections of an m-dimen-


sional spherically symmetric distribution are unimodal.

This theorem implies (in view of Theorem 1.6.3) the following important
necessary condition for characteristic functions of spherically symmetric dis-
tributions.

THEOREM 1.8.20. Let /(t) be the characteristic function of an m-dimensional


spherically symmetric distribution, m > 3. Then for any unit vector e e Rm,
the function fe(t) = f(te) is differentiate for at least all t * 0.

This theorem shows, in particular, that the Polya criterion (Theorem 1.3.9)
cannot be extended to multivariate case straightforwardly (see Appendix A,
Example 37) at least for m > 3. Actually, as shown in (Velikoivanenko, 1987),
the same situation takes place in the case of m = 2.
Another consequence of Theorem 1.8.19 is (due to Theorem 1.6.1) the fol-
lowing representation of characteristic functions of spherically symmetric dis-
tributions for m > 3: each of these functions can be represented as

1
Z(t)=TTr/ g(u)du,
Fll Jo

where g(u) is a univariate characteristic function.


A spherically symmetric distribution F in R m is said to be unimodal if it
can be represented as follows (recall that we consider only the distributions
which are spherically symmetric about the origin)

F(x) = pE(x) + (1 p)G(x),

where 0 < < 1, E(x) is the degenerate, concentrated at zero distribution, and
G(x) is an absolutely continuous distribution whose density () satisfies the
condition g(x x ) > g(x 2 ) if ||xi|| < || 2 ||.
1.8. Multivariate characteristic functions 63

Let F be a unimodal spherically symmetric distribution in R m . It can be


represented in the form
poo
F= / V* r) dH(r), (1.8.4)
Jo
where is the uniform distribution in the m-dimensional sphere (in the
ball) of radius r with center at zero, and H(r) is some distribution function.
Each one-dimensional projection of m > 1, is an absolutely continuous
distribution with the density

r((m + 2)/2) ( *2Ym"1)/2 ..


1
{r
v \x) = < ry/nT({m +1)/2) ^ - "9
r J
2 .
' 1 1m^r*
" ' (1.8.5)
0, bcl > r.
Taking into account that any spherically symmetric distribution is com-
pletely determined by its univariate marginal distribution, and comparing
(1.8.3) and (1.8.5), we come to conclusion that the uniform distribution in the
m-dimensional sphere (ball) of radius r is the m-dimensional marginal distri-
bution of the uniform distribution on the surface of the (m + 2)-dimensional
sphere of the same radius. This implies the following theorem, which is a
generalization of Theorem 1.8.19.

THEOREM 1.8.21. A distribution in R m+2 , m > 1, is spherically symmetric if


and only if its m-dimensional marginal distributions are unimodal spherically
symmetric in Km.

The characteristic function of the uniform distribution on the m-dimen-


sional sphere of radius r (centered at zero) is

fl>w(t) = ( I ) ( ^ ) m / 2
'WiirlltH). (1.8.6)

The characteristic function of the uniform distribution in the m-dimensional


sphere (ball) of radius r is

vW(t) = + l) (-^)m/2e7m/2(it||). (1.8.7)

Therefore, by (1.8.2) and (1.8.4) we obtain the following (quite similar) char-
acterizations for spherically symmetric and unimodal spherically symmetric
distributions respectively.

THEOREM 1.8.22. A function fit), t e R m , m > 2, is the characteristic function


of a spherically symmetric distribution if and only if it can be represented as
/m\ f f 2 \m/2_1
:r||,||)<iH(r)
m
= T
\ i ) L Ui) '
64 1. Basic properties

where H(r) is a univariate distribution function concentrated on the positive


half-line.

THEOREM 1.8.23. A function fit), t e Rm, m > 2, is the characteristic func-


tion of a unimodal spherically symmetric distribution if and only if it can be
represented as

/(t)= + m/2jm/2(r||t|l)dG(r)
K? ^r()
where G(r) is a univariate distribution function concentrated on the positive
half-line.

A natural extension of the notion of the spherical symmetry is the so-called


-symmetry. An m-dimensional random vector X = CXi, ...,Xm) is said to have
-symmetric distribution, > 0, if its characteristic function /(t) is of the form

f(t) = / ( f i , ...,tm) = (|| + ... + \tm\a).

Thus spherically symmetric distributions are 2-symmetric. As concerns a-


symmetric characteristic functions, see, for example, (Cambanis et al., 1983;
Gneiting, 1998).
Some additional information on multivariate characteristic functions can
be found in (Feller, 1971; Cuppens, 1975; Bhattacharya & Ranga Rao, 1976;
Linnik & Ostrovskii, 1977; Fang et al, 1989).

1.9. Notes
Characteristic functions were used in probability theory long since: almost two
centuries ago Laplace employed them in his work concerning a limit theorem
for independent, uniformly distributed random variables. At the beginning
of the twentieth century, Lyapunov widely used the method of characteristic
functions in studying limit theorems. A systematic study of properties of
characteristic function was started by Levy (see (Levy, 1925)). The modern
state of the theory of characteristic function is reflected in a number of books:
(Gnedenko & Kolmogorov, 1954; Lukacs & Laha, 1964; Ramachandran, 1967;
Lukacs, 1970; Feller, 1971; Ibragimov & Linnik, 1971; Kawata, 1972; Cuppens,
1975; Petrov, 1975; Linnik & Ostrovskii, 1977; Loeve, 1977; Lukacs, 1983;
Galambos, 1988; Prokhorov & Rozanov, 1969).
Most results presented in Chapter 1 are well-known and can be found
together with proofs in many textbooks and monographs, first of all, in those
mentioned above. Respectively, historical and bibliographical information con-
cerning these results can be derived from these sources as well. Due to this
reason, here we touched upon only some results, mainly recent or less known.
1.9. Notes 65

Theorems 1.3.5 and 1.3.9 are due to (Berman, 1975). Some close results
were obtained in (Velikoivanenko, 1987; Velikoivanenko, 1992). Theorem 1.3.6
was obtained in (Trigub, 1989).
Theorem 1.4.1 is due to (Heathcote & Pitman, 1972). Theorem 1.4.2 was
derived in (Postnikova & Yudin, 1977). A weaker variant of Theorem 1.4.4
was originally obtained by Cramer (see (Cramer, 1962)), and later improved
in (Postnikova & Yudin, 1977). Theorem 1.4.5 was obtained in (Raikov, 1940).
Theorem 1.4.6 was obtained by Loeve (see (Loeve, 1977)). Theorem 1.4.9 is
due to (Esseen, 1945). There are some generalizations of this inequality (see,
e.g., (Petrov, 1975)). Theorem 1.4.10 is due to (Tsaregradskii, 1958). Theo-
rem 1.4.12 (Theorem 1.4.11 is its particular case) was obtained in (Meshalkin
& Rogozin, 1962). Theorems 1.4.13, 1.4.14, and 1.4.15 are, respectively, due
to (Zolotarev, 1970; Zolotarev, 1971; Zolotarev & Senatov, 1975; Esseen, 1945).
Proposition 1.4.1 is due to (Kallenberg, 1974).
Theorem 1.6.5 was obtained in (Askey, 1975). Theorem 1.6.6 is due to
(Medgyessy, 1972); Theorems 1.6.7 and 1.6.8 are due to (Ushakov, 1998). The-
orem 1.8.14 is due to (Postnikova & Yudin, 1977).
2

Inequalities

In this chapter, we present various inequalities and estimates for character-


istic functions. These inequalities can be useful in the investigation of limit
theorems, in nonparametric statistics (some of these applications will be given
in Chapter 3) and in various problems related to stability. In all sections except
Section 2.7, the univariate case is considered.

2.1. Auxiliary results


This section contains some auxiliary results which will be used in other sections
of the chapter. First we prove some elementary inequalities. Denote
1 cosx
(2.1.1)
x "
The following estimates are true:


COS* < 1 ^2(c) , |jc| < c, (2.1.2)

for any 0 < c < 2;

(2.1.3)

for any 0 < c < y/2;

(2.1.4)
- 1 + 2 ^ - 1 + 2

for any 0 < < 1;

'.2
h2(x) > 1 - (2.1.5)
12

67
68 2. Inequalities

for all x;

s i n * _ . , , 2
< 1 -A2(c), |*|<c, (2.1.6)
* 6
for any 0 < c < 2;

cos* < 1 (2.1.7)



for |jc{ < /2;

2
cos* > 1 - (2.1.8)

for all *;
Sln
*
cos* < (2.1.9)

for |*| < .


To prove (2.1.2), we rewrite it in the form
1 cosc 1 cos*
2 * 2

or

()M) 2
Now it suffices to observe t h a t the function sin*/* decreases for |*| < and is
non-negative in this interval.
To prove (2.1.3), we rewrite it in the form (without loss of generality assume
that * > 0)

Thus it suffices to prove t h a t the function

decreases in the interval (0, y/2). But this is obvious because both functions
2/*2 and ln(l - * 2 /2) decrease.
To prove (2.1.4), let u s use the elementary inequality ln(l +z) <z which is
true for any > 1. Then we obtain (for 0 < * < 1)
2 2
t , , 2 / 2 \ , *2
2.1. Auxiliary results 69

To prove (2.1.5), we rewrite it in the form

x2 x 4
cosx < 1 + . (2.1.10)
2 24

Using an expansion for cos we obtain

X2 COS(iV)(0x) 4
cosx =1 + X
2 24
X2 COs(x)
= 1 +
2 24
2 4
2+24'

where || < 1, which coincides with (2.1.10).


Let us prove (2.1.6). Since both sides of the inequality are even functions,
it suffices to prove that

.x3
v(x) = h,2(c) sinx > 0
6

for 0 < < c. We have

x2
() = 1 hzic) cosx > 0
Li

due to (2.1.2). Hence v(x) is non-decreasing for 0 < < c. Taking into account
that v(0) = 0, we see that v(x) > 0 for 0 < < c.
Let us prove (2.1.7). As in the previous case, assume without loss of gener-
ality that > 0. Set
v(x) = 1 s-x2 cosx.

We have v(0) = 0, (/2) = 0. Consider the derivative of v(x):

/ 8
(x) = sinx X.

Since the function sinx/x decreases in the interval [0, /2], 7() has only one
root in this interval, therefore either v(x) > 0 for all e [0, /2] or v(x) < 0 for
all e [0, /2]. Demonstrate that the first relation holds. Indeed, we have

. (sinx 8\

provided thatbx is sufficiently small, i.e., for some > 0, v(x) > 0 for e [0,a].
70 2. Inequalities

To prove (2.1.8), consider the function

x2
v(x) = cos* 1 + .
Lt

We have v(0) = 0 and v'(x) = sinx > 0, > 0, which implies (2.1.8).
Finally, let us prove (2.1.9). Without loss of generality, we may assume
that > 0. Consider the function

v(x) = sinx cosx.

We have v(0) = 0, and V(x) = sinx > 0 for 0 < < . These obviously imply
that v(x) is positive on the interval (0, ), i.e., (2.1.9) is valid.
A set of distributions and the set of the corresponding characteristic func-
tions (denote the latter by are said to be closed with respect to translation
if the inclusion fit) e & implies the inclusion f(t)eltb e & for any real b. It
is easy to see that a class of characteristic functions is closed with respect to
translation if and only if it can be represented in the form

& = U
ael

where I is some set of indices and for each & a the corresponding set of distri-
butions is an additive type (a set of distributions is called the additive type of
distribution F if it consists of all distribution functions of the form F(x a),
oo < a < oo; see (Prokhorov, 1965; Kagan et al., 1973). The following lemma
is frequently used in this chapter.

LEMMA 2.1.1. Let & be a class of characteristic functions closed with respect
to translation, be an arbitrary set of the real line, and let g(t) be a real-valued
function defined on B. If for any f e & and any t e B,

|9V(i)| * git),

then
\m\<g(t\ t e B.

PROOF. Let us fix an arbitrary / e & and an arbitrary t e B. Then, since &
is closed with respect to translation,for any b

|*(/(f)e ii6 )| <g(t).

Let us choose b = b(t) so that

3 ( f ( t ) e i t b ) = 0,
2.1. Auxiliary results 71

i.e., in accordance with the condition

9l/(i) sin tb = -Z fit) cos tb

(this is always possible because the equation sin ; = b cos has roots for
any a and b). So, we obtain

|/)| = \f(t)eitb\ = |R (f(t)eitb)\ <g(t).

This proves the lemma because t and f are arbitrary.

The two lemmas below are very close to each other but are presented
separately for convenience.

LEMMA 2.1.2. Let (), oo < < oo, be a real-valued nonnegative function,
symmetric about some c, and non-increasing for > c, and let X and Y be
random variables. If

P(\K-c\<a)<P{\Y-c\<a)

for any a >0, then


< ().

PROOF. Without loss of generality we can assume that c = 0. In addition, we


can suppose that

lim () = 0

because anyway () = o(x) + , where


= lim ()
||oo

and
lim () = 0,
|*|-oo
and
() = () + , () = () + .

Observe that
() = (||)

and
() = (||).

Set
r>(x) = p ( | y | < x ) - p ( ^ | < x ) .
72 2. Inequalities

Then D(x) > 0 and D(0) = 0 therefore

() - () = (||) - (\\)
roc POO

= {)() = - D(x)dq>(x)> 0,
J0 Jo
because () is non-increasing for > 0 and hence

POO

/ D(x)dq>(x) < 0.
Jo

LEMMA 2.1.3. Let fi(x), fi(x) and g(x) be integrable functions defined on the
interval [a,b]. Suppose that g(x) is non-increasing on [a,b],
pb pb
/ fi(x)dx= / f2(x)dx,
Ja Ja
and there exists c (a, b) such that

fl(x) > f2(x)

forxe [a,c), and


/!(*)</ 2 (*)
for [c, &]. Then
f f1(x)g(x)dx > f f2(x)g(x) dx.
Ja Ja
PROOF. It is clear that

fb h(x)g(x) dx - f f2(x)g(x) dx
Ja Ja

= fb Iflix) - f2(x)]g(x)dx
Ja

= [C\fi(x)-f2(x)]g(x)dx- fb[f2(x)-fl(x)]g(x)dx
Ja Je

Zgic) If i(x) - f2(x)] dx - g(c) fb [f2(x) - flix)] dx


Ja Je
= g(c) [b[f1(x)-f2(x)]dx = 0,
Ja
which proves the lemma.
2.1. Auxiliary results 73

Let fix) be a non-negative continuous function defined in the interval [a, &].
Let us introduce the functional

sup [ f(x)dx,
BeMu)
)J

where is the class of all Borel subsets of the interval [a, 6] whose Lebesgue
measure is equal to (it is assumed that < b a). We need the following
properties of the functional 3Fba.

LEMMA 2.1.4. Let a be a real number, b, c and d be positive numbers such that

, 1
b>, cd > -.
2c 2
Then
^c(dcoSbx,-)< sm.

PROOF. Without loss of generality, one can suppose that c = /2 (the general
case can be reduced to this one by the change of variables). Let 0 < < .
Then, for any real a and b > 1,

Sraa-iUcosbx,X) < ^l2(cosbx,X) = l^llzicosxM) < videos , );


b

hence, for d > 1/,

K-Ll (dcosbx, < (dcosx, i )


1/(2d)
cos xdx - 2d sin .
-i: 1/(2d) 2d


LEMMA 2.1.5. Let f(x) be a probability density function andgix) be a continuous
function. If
sup f(x) < c

and
rb
f fix)dx = 1
Ja
for some a and b, then

fix)gix)dx<^ba (cg,-c
74 2. Inequalities

PROOF. Let us set


= {xe [a,b\: g(x) > }

and denote the probability density function of the uniform distribution on


by (). Choose so that the Lebesgue measure of is 1/c. Then

J f(x)g(x) dx < f* ux(x)g(x) dx = ji ux(x)g(x) dx < STha (eg, ^ .


LEMMA 2.1.6. Letm> 2. Then

(2)
1 I 2 ) m

PROOF. If m is even, then

.r(f + l) fr(f) m
r (i) r(f) r(f) 2"

Let m be odd. Set = (m 1)12. Then

+ (N+I)

13s(2n + 1)2" /
~ 13s(2ra - l ) 2 n + 1
In +1 m
= 2 = 2*'


LEMMA 2.1.7. Lei X and Y be independent and identically distributed random
variables with zero expectation and finite absolute moment of order 2 + , 0 <
< 1. Then
E\X - \2+ < 2(||2+ + 2 ^| ).

PROOF. Denote the distribution function of X by F(x). Making use of the


elementary inequality | 6| < |a|s + ||5, which is true for all real a and b,
2.1. Auxiliary results 75

we obtain
oo rOO

/ -oo Joo
OO roo

/
/ (: - Y ) 2 ( | X | 5 + \y\s)dF(x)dF(y)
oo Joo
oo roo
/ -OO J oo
oo roo

/ -OO /
J ooay(|x| + |y| )dfXx)dFXy).
e e

The second term on the right-hand side is equal to zero:


oo roo roo roo

/ /
-ooJoo
xy(\x\s+ \y\s)dF(x)dF(y) = 2 Joo
xdF{x) Joo
ybl^W

= 2(||) = 0,

therefore

Epe - \2+ < [(2 + y2)(|Z|5 + \\)] = 2(||2+ + 2\\).



LEMMA 2 . 1 . 8 . Let and be independent random variables, and EY" = 0.
Then

E\X + Y\>E\X\.

PROOF. Denote the distribution function of X by F(x). For any real x,

E|x + Y| > |E(x + 7)| = |*|,

therefore
oo roo

/ -oo
Ejx + YjdF(x)> / Joo |x| dF(x) = E|X|.


LEMMA 2 . 1 . 9 . For any 0 <p< 1/2, the function

() = ^ ^ - ( + (1 - p ) c o s x ]

has a unique root , xo > , in the interval (0,2). In addition, () is positive
for 0 < x < xq and negative for xq < x < 2.
76 2. Inequalities

PROOF. First let us establish the following elementary inequality:

sin* 1 + cos*
> (2.1.11)
2

for 0 < < . Indeed, for 0 < < we have

2 sin* x(l + cos ;) = 4 sin cos - 2x cos2


2 2 2
( . x\
= 4 cos - sin cos - > 0,
2 V 2 2 2/

therefore, making use of inequality (2.1.9), we obtain (2.1.11).


Now consider the function (). If 0 < < , then, in view of (2.1.11), we
obtain
. 1 + cos* sinx
+ (1 )cosx < < ,
2
i.e., for these the function () is positive. Let denote the root of the
equation sinx = cos* in the interval (, 2). For < < 2,

sin*
< cos* <p + (1 )cosx,

hence () is negative.
It remains to consider the interval (,). In this interval the function
sin x/x decreases (because its derivative OJCCOSX sin x)/x2 is negative), while
the function + (1 ) cosx increases. Hence sin x/x and + (1 ) cosx can
have at most one intersection. Summing up, we obtain the assertion of the
lemma.

LEMMA 2 . 1 . 1 0 . For any 0 < < 1,

r2
cosx < 1 + c(5)|x
L
for all real x, where

and is the unique (in the interval (0, In)) root of the equation

2 1 .
-x + - - x s m x + cosx = 1.
2(2 + 5) 2+
2.1. Auxiliary results 77

PROOF. Without loss of generality we can take > 0. Let us fix an arbitrary
e (0,1] and consider the function

x2
<pz(x) = 1 + 2 + cosx,

depending on the nonnegative parameter z. It is easy to see t h a t for all


sufficiently large (for instance for > 1/(2 + )), () > 0 for all > 0. At
the same time, for small (say, for = 0), () can take negative values.
Thus there exists the infimum zm of all satisfying the condition t h a t () is
non-negative for a l l * > 0, i.e.,

zm = min{z > 0: () > 0, > 0}

(we write min instead of inf because this infimum is, obviously, attained). Let
us prove t h a t zm = c(<5) where c(<5) is defined by (2.1.12).
Observe t h a t all roots of the function (pZm(x) (we consider () only for
> 0), if they exist, are isolated. Besides, if xr is a root of {), then both
conditions

<pzJx) = 0 (2.1.13)

and

=0 (2.1.14)
X=Xr

must be satisfied because <pZm(x) cannot change its sign.


The function <pZm(x) has at least one positive root. Otherwise we would
have, in view of the relations

lim () = oo
>00

and
(pZm{x) > 0 , > 0,
t h a t for sufficiently small ,

-() >0, > 0,

which contradicts the definition of zm. Denote the minimal positive root of
() by xm. Then relations (2.1.13) and (2.1.14) mean t h a t

l-^+zmx%s -cosx = 0

and
-Xm + (2 + S)zmx!L+s + sinJCm = 0.
78 2. Inequalities

Finding zm from the second equation:

xm - sinx
Zm
(2 + 5)xl+s '

and substituting this expression to the first one, we obtain the following equa-
tion for xm:
1 -
2(27S)*" ~ h ) X m SinXm - CSXm =

It remains to show that the equation

, 1
1 - r*2- xsinx cosx = 0 (2.1.15)
2(2 + 5) (2 + )

has a unique solution in the interval (0,2) for any 0 < < 1.
Consider the left-hand side of equation (2.1.15):

() = 1 - -x2- -xsinx cos :.


2(2 + ) (2 + )

We have

2*

(0) = 0, (2) = -(7)<0' (2.1.16)

and
(1 + ) sin
'() = cos
(2 + ) (1 + ) (1 + )
Due to Lemma 2.1.9, the expression in square brackets in the last relation has
exactly one change of its sign (at some point xo) in the interval (0,2), and this
expression is positive for 0 < < xo and negative for xo < < 2. Therefore the
derivative '(), which coincides with the expression in square brackets up to
a positive factor, satisfies the same conditions. Hence y/(x) increases on the
interval (,) and decreases on xo, 2. Taking (2.1.16) into account, we come
to the conclusion that () has a root in (0,2), and the root is unique.

LEMMA 2.1.11. For any e (0, ] and all real x,

0 . sin0.. .
cos* < cos 0 sin 0 + (|x| ) . (2.1.17)
2 20

PROOF. First let us prove that for any 0 e (0, ] and all e [0,2],

0 . sin0,
cosx < cos 0 sin 0 + ( ) . (2.1.18)
2 20
2.1. Auxiliary results 79

Let us fix an arbitrary e (, ] and consider the function


. , sin, s2 .
() = ( ) cos sin cos*.
26 2
The second order derivative of () is
s i n
//,() + cosx;

hence "() has at most two zeros on the interval [0,2], This implies t h a t
() has at most four zeros (with the account for their multiplicity) on [0,2].
But it is easy to see that
( + ) = ( - ) = 0
and
'*= + = <'\=- = 0'
i.e., each point is a zero of the second order of the function (). This
means that () does not have other zeros on the interval [0,2] except the
points . But in these points the second order derivative takes positive
values. Indeed,
//, . sin , s sin ,
()\-+ = - - + cosU ) = cos(0) > 0

by virtue of inequality (2.1.9). Therefore () is non-negative on the interval
[0,2], i.e., (2.1.18) holds.
Observe now t h a t the right-hand side of (2.1.18) increases for > ; hence
for >2 we have
. sin , ~ . sin 2
cos sin + ( ) > cos sin +
2 2 2 2
> cos(2tt) = 1 > cos ,

i.e., (2.1.18) holds for all : > 0. Taking into account that cos is an even function,
we finally arrive at (2.1.17).

The next two lemmas concern functions of bounded variation. The defini-
tion of these functions was given in Section 1.4.

LEMMA 2 . 1 . 1 2 . Let p(x) and q{x) be two probability density functions, and r{x)
be their convolution:
OO POO

/ -oo p(.x u)q{u) du =J /OO p(u)q(x u)du.


Then
V(r)<min{V(p),V( 9 )}.
80 2. Inequalities

PROOF. Let xo < x\ < . . < xn be arbitrary points of the real line. We have
n I r oo
n ro
y: - *_)| = \ ( ~ u)q(u)du - / p(xi-i - u)q(u)du
i=l i=l i 7 " 00
n /-oo
= / [ p ( * j - " ) - " ) ] < ? ( " ) d u

oo n

/
^ |pii - - u)\q(u)du
i=oo

/ -oo q(u)du = V(p).


Since and xo.^i. >* are arbitrary, this yields

V(r) < V(p).

Similarly we obtain V ( r ) < V(q).



LEMMA 2.1.13. Let p(x) and q(x) be two probability density functions, and r{x)
be their convolution. Ifp(x) is times differentiable, then

V(r ( n ) ) < V(p ( n ) ).

The proof of the lemma is similar to that of Lemma 2.1.12: taking deriva-
tives of both sides of the equality
oo

/ -oop(x u)q(u)du
we obtain oo

/ -oopM(x - u)q(u)du,
and now we can repeat the proof of Lemma 2.1.12 replacingp(x) by p^nHx).

LEMMA 2.1.14. Let and t be complex numbers such that \z\ < 1, \t\ < 1. Then

\z t\< \\z\ - |f|| + I argz - argi|.

PROOF. We have

\z-t\ <\\z\el3X92 -|*|eiargi|


= \\z\eiaTgz - \z\eiaTgt + \z\eiaTgt - \t\eiaigt\
< |z||earg2: e iargi | + |eiargi|||z| |i||
< | e i a r g z _ e i argi, + ||z|_|i,|
2.2. Distributions with bounded support 81

Now it suffices to use the elementary inequality

which is true for any real and .


LEMMA 2.1.15. For any positive integer and any > 0,
ix (ix) xn
U (n - 1)! n\

PROOF. Denote
ix ji1
(ix>
vn(x) = e* - 1 - -
(; - 1)!
Then
vn(x) = i [ v_i(u)du,
Jo
and the result is derived by induction, since

|vi(x)| = Ii eiudu
I J

2.2. Inequalities for characteristic functions of


distributions with bounded support
In this section, we consider characteristic functions of distributions with
bounded support. We investigate this class of characteristic functions sep-
arately because in this case much stronger results can be obtained than in the
general case.
Let X be a random variable with distribution function F(x). Recall that the
concentration function of X (or of F) is defined as
Qx(l) = Q(F, I) = sup ( < X < a +1)
a
(a systematic analysis of the concentration function is contained in (Hengart-
ner & Theodorescu, 1973).
THEOREM 2.2.1. Let X and Y be bounded random variables, < c, |Y| < c,
c >0, and let (pit) and () be their characteristic functions. If
Qx(2z)<P(\Y\ <z)
for any >0, then
|()|< WO
for |f| < /2c.
82 2. Inequalities

PROOF. Without loss of generality, we can assume t h a t 0 < t < n/2c. Let
us fix an arbitrary t from the interval [0, n/2c]. Denote the additive type
of distribution of the random variable X by s i x . Let us take an arbitrary
distribution function Fix) from s/ and denote its characteristic function by
fit). Set
Ink Ink
Lk = (2.2.1)
~2t t ' 2t t
= {x: Fix) * 0} u {x: F(x) * 1}.

Introduce the distance between sets on the real line and the diameter of a set
as usual:
p(E1,E2) = inf \x - y\, d{E) = sup - y\,
xjeE
yeE2
where E, E\, and Ei are any subsets of the real line.
One of the following two inequalities holds:

91f i t ) > 0, (2.2.2)

or

<R/(i) < 0. (2.2.3)

Assume, for example, t h a t (2.2.2) holds. Then


oo

-oo cos itx)dFix)


/ -oo
oo .
< / cositx) dFix), (2.2.4)

because cos(fce) > 0 when e UaS-oo^ and cositx) < 0 when < U ^ - o o ^k
Further,
piLi,Lj) > y > 2c > diB),

therefore, there exists only one ko such t h a t

/.
J
ik 0
dFix) > 0,

i.e.,

Lk L
dFix) = 0

if k * ko, and (2.2.4) can be written as

I W ) | * J / cos
' itx) dFix). (2.2.5)
Lk0
2.2. Distributions with bounded support 83

Denote the distribution function of the random variable Y by G(x) and set

{) = i C O s ( i x ) ' * e Lko>
\o, x Lko.

Let Xo and Yo be random variables with distribution functions Fix) and Hix),
respectively. Since Yq = Y + 2nko/t and Xq is some translation of X, we have

(1*0 - ^ \ < z ) < Qx(2z) < P(|y| <*) = ( | y 0 - <*)

for any > 0. Therefore Lemma 2.1.2 can be applied. Using Lemma 2.1.2 and
(2.2.5), we obtain

|9l/(i)| < [ cos(tx)dF(x)= f Q{x)dF{x)


JLk0 J-co
< J ()() = 9 (y/(t)e27liko) = <&).

So, if (2.2.2) holds, the inequality

|9t/(f)| ^ (2.2.6)

is proved. In the case where (2.2.3) holds, the same inequality can be proved
similarly. Thus, (2.2.6) is true for any F e s/ and any t e [n!2c, n/2c]. But
the class of distributions s i x is closed with respect to translation, hence, by
virtue of Lemma 2.1.1,
|()| = |/(f)| < 9ty(f)
for |f| < /2.
THEOREM 2.2.2. Let X be a random variable with density function pix) and
characteristic function fit). If < c with probability 1 and p(x) < a for all (c
and a are some positive constants), then

|/(f)|<ysin^ for < (2.2.7)

and
4dC ft 71
|/(f)|< s i n for |f| > . (2.2.8)
4ac 2c
84 2. Inequalities

PROOF. The first inequality follows directly from Theorem 2 . 2 . 1 (when Y is


a random variable uniformly distributed in the interval [ 1 / 2 a , L / 2 A ] ) . Let
us prove t h e second one. Let s/ be the additive type of distribution of the
random variable X. Since f is closed with respect to translation, it suffices
to prove (Lemma 2 . 1 . 1 ) t h a t

. .. 4ac . . .
|9Wi)|< s i n , / o r \t\>-,

for any characteristic function () such t h a t the corresponding distribution


belongs to si .
Let us fix an arbitrary y/(t) from six and an arbitrary t > n!2c. One of the
following two inequalities holds:

SKyA(i) > 0 (2.2.9)

or

9ty(f) < 0. (2.2.10)

Suppose, for example, t h a t (2.2.9) holds. Denote by q(x) the density function
corresponding to the characteristic function (). Let [a c,a + c] be some
interval such t h a t q(,x) = 0 if [a c,a + c\. Using Lemmas 2.1.4 and 2.1.5,
we obtain
oo ra+c

/ -oo q(x) cos(fce) dx =Jac


I q(x) cos(ix) dx

<^aa%(acos(tx))< sin-^,
V aJ 4ac

provided that |i| > n/2c.


The case where (2.2.10) holds is treated similarly.

COROLLARY 2 . 2 . 1 . Let the conditions of Theorem 2.2.2 be satisfied. Then

for M s
"""-sro i
and

for | ! >

The corollary follows from the theorem and inequality 2.1.6. One j u s t
should take into account t h a t 2ac > 1.

R E M A R K . Both inequalities ( 2 . 2 . 7 ) and ( 2 . 2 . 8 ) are sharp. In ( 2 . 2 . 7 ) , the equality


is attained when p(x) is the uniform density in the interval of length 1 la. In
2.2. Distributions with bounded support 85

(2.2.8), the equality is attained for arbitrary large t under an appropriate


choice of a distribution. In other words, although the characteristic function
of any absolutely continuous distribution converges to zero as t oo, this
convergence is not uniform even in each set of uniformly bounded densities
having uniformly bounded supports (see Example 35 of Appendix A). The
situation changes if distribution under consideration is unimodal. In this case,
uniform estimates can be obtained even without the condition of boundedness
of the support (see Theorem 2.4.3).

Let h\(,a) and h^ia) be functions defined by (2.1.1).

THEOREM 2.2.3. Let X be a bounded random variable, < c, fit) and 2 be


its characteristic function and variance respectively. For any a e [0, /4],

|/()| < 1 - 2 ()-


Li

for || < ale.

PROOF. Fix an arbitrary from the interval [0, /4]. Denote the additive type
of the distribution of the random variable by Since srf is closed with
respect to translation, it suffices to prove (Lemma 2.1.1) that

2*2
|9ty(f)| 1 - A 2 ( a ) - (2.2.11)

for |f I < ale for any characteristic function () whose distribution belongs to
fi/.
Let us fix an arbitrary () whose distribution belongs to si and fix an
arbitrary t from the interval [0, /c]. We will use the same notation as that
introduced in the proof of Theorem 2.2.1. In particular, Lk (k = 0,1,2,...)
is determined by (2.2.1). As in the proofs of Theorems 2.2.1 and 2.2.2, we
should separately consider the cases > 0 and SRi^(i) < 0, and, because
the reasoning is similar, we will consider only the first case.
As in the proof of Theorem 2.2.1, we come to the conclusion that there
exists an integer ko such that

|9ty(f)| =^(t)< f cos(tx)dF(x), (2.2.12)


J
Lo

where F(x) is the distribution function corresponding to the characteristic


function (). Set
86 2. Inequalities

Then Oq = of = a2, where <Jq and are the variances of distributions G and
F respectively. Substitutingy for 2nko/t in (2.2.12), we obtain

\<Ry/(t)\ * [ cos(ty)dG(y).
JLo
Since the support of the distribution of the random variable X is included in
the interval [c,c], there exists some real d such that the support of F{x) is
included in [d c,d + c]. Let us introduce the following notation:


H+ = _ =
2? 7
2t

L+ = 0, L_ =
21 "0
2nko , 2nko
B0 = d - c,d +c

It is easy to see that the support of the distribution G(x) is included in


Three cases are possible:

(1) 0 c L + u L _ = L 0

(2) B0r\H+ it 0 , B0 <zH+ u L t

(3) B 0 n H _ &, Bo c H - u L _

Let us consider them separately.


1. We have
f dG(x) = 1.
JLo
Let us use inequality (2.1.2) from Section 2.1; then we obtain

r /-/21 ( t2x2\
|9tvr(f)| < / cos(tx)dG(x) < 1 - h2(a) dG(x)
JLo j-%m \ 2 J
+2 roo fj2/2
= 1 - h2(a)~ / x2dG(x) < 1 - h2(a)-.
2 Joo 2

2. In this case
2nko
do d c > 0.

Let us set H(x) = G(x + do); then = Oq = a2, and the support of the
distribution H(x) is included in the interval [0,2c]. Let us prove that

I c o s ( t x ) d H ( x ) > I cos (fct) dG(x).


JLq JLq
2.2. Distributions with bounded support 87

Indeed, we have

= / cos(ta) dH(x) / cos(fct) dG(x)


//2f rd0+n/2t
= / cos(fx) dH(x) / cos(fcc) dG(x)
Jo Jdo
rdo+n/2t rdo+n/2t
= / cos[i(y - d0)l dG(y) - / cos(ty)dG(y)
Jdo
da Jdo
dn
da+n/2t . .
-fJddo
cos[i(y - d0)] - cos(ty)JdG(y).

Taking into account that 0 < do < /2 and that cos* decreases in the interval
[0, ], we see that > 0. Thus, we finally obtain
rn/2t
mzt rnut
//2

/ cos(tx)dG(x)< / cos(tx)dH(x)
p2c +2 pn/2t
= cos(tx)dH(x) < 1 - h2(a) I x2dH(x)
Jo 2 J-n/2t
2*2
< 1 - 2().

3. This case can be considered similarly to case 2.
The proof of (2.2.11) is the same if the inequality 9ly/{i) < 0 holds (instead
of9ty(*)>0).

COROLLARY 2 . 2 . 2 .Let X be a bounded random variable, | X | < c, fit) and o2 be


its characteristic function and variance respectively. Then

|/(f)| ^ 1 - 4 2 i 2

for |i| < n/4c.

THEOREM 2.2.4. Let X be a bounded random variable, < c, f(t) and o2 be


its characteristic function and variance respectively. For any a e [0, /4],

2*2 2*21
exp { h\(a) } < |/()| < exp < - 2 ( ) - >

for || < ale.

The upper bound in this theorem follows from Theorem 2.2.3, whereas the
lower bound is a particular case of Theorem 2.3.4 below.
88 2. Inequalities

COROLLARY 2 . 2 . 3 . Under the conditions of Theorem 2.2.4, the inequalities

{ 1 + 2 , ] , 1 -a2/12 2
exp - < |/(f)| < exp - " Vi (2.2.13)

are frue /or |i| < ale.

To obtain this corollary it suffices to use inequalities (2.1.4) and (2.1.5) from
Section 2.1.
Taking various values of in (2.2.13), we obtain estimates for |/(f)| which
are more precise in narrower intervals. For example, setting = 1/4, we obtain

2*2
exp { - 1 . 0 1 6 - } < |/(f)| < exp { - 0 . 9 9 4 - }
21

for Iii < l/4c.

2.3. Moment inequalities


Bounds for characteristic functions presented in this section involve only mo-
ments of the corresponding distributions. We start with estimates which can
be considered as sharpening the expansion of characteristic functions given by
Theorems 1.5.3-1.5.5.

THEOREM 2.3.1. Let f(t) be an even characteristic function. If for some non-
negative integer the derivative 0) exists, then

<2* (2 3 1)

for all t e (00,00).


If the derivative /4n+2)(0) exists, then for all t e (00,00),

2n+l fWfn-v
1 (2
w * 3'2)

PROOF. Let us prove the first inequality. Since f 4 n ) ( 0 ) is finite, / 4 n ) ( f ) exists


and is continuous for all t e (00,00). Therefore, the representation

=0
2.3. Moment inequalities 89

holds, where || <1. For any s we have


oo roo
/ -oo x4n cos(sx)dF(x) < Joo
/ xindF(x) = / 4 n ) (0),

i.e.,

/4n)m < /4n)(0). (2.3.4)

Relations (2.3.3) and (2.3.4) imply (2.3.1).


Inequality (2.3.2) is proved similarly. One just should use the relation
oo
/ -oo x4n+2dF(x) = / 4 n + 2 ) (0).

COROLLARY 2.3.1. Let Fix) be a symmetric distribution function with charac-
teristic function fit), variance a2 and fourth moment ^. Then

, <A2 , 2 Pit4

for all t e ioo, oo).

THEOREM 2.3.2. Let Fix) be a distribution function with characteristic function


fit) and finite variance 2. Then

2?

|/()| > 1 - (2.3.5)

for all real t. Moreover, if Fix) has zero expectation, then


9
9/() > 1 - (2.3.6)

for all t.

PROOF. First let us prove that if the expectation of Fix) is equal to zero, then
(2.3.6) is true.
Since Sifit) is the characteristic function of a symmetric distribution, by
virtue of Corollary 2.3.1, we obtain

eft 2
91/() > 1 -

where is the variance of the distribution corresponding to SRfit). Thus,


(2.3.6) will be proved if we demonstrate that = 2.
90 2. Inequalities

We have (in view of the first part of Theorem 1.5.3)

2*2

/()= 1 - +( 2 ), ->0,

2*2
/(-*) = ! - - + o ( t 2 ) , -> 0;

therefore

9t/(f) = i (/() + /(-)) = - ^ + 0 ( 2 ), -> 0,

which, in view of the second part of Theorem 1.5.3, implies that the variance
of the distribution corresponding to 9/() is 2 .
Now suppose that F(x) has an arbitrary expectation. Denote it by a. Then
the expectation of the distribution function F(x + a) is equal to zero; hence

*(e-itaf(t))> 1 - ^ f l

therefore we finally obtain

2*2
|/(f)| = |e_Iia/(i)| > <R (e~Uaf(t)) > 1 -


THEOREM 2.3.3. Let F(x) be a distribution function with zero expectation, vari-
ance 2 and characteristic function fit). Then

a2t2

for all real t.

PROOF. Making use of Lemma 2.1.15, we obtain


oo I roo

/ -OO (eltx - 1) dFix) / -oo


= \J oo(eitx - 1 - itx) dF(x)
oo roo ,2 2
2*2
/
-OO \j -l-itx\dF{x)< / dF(x) =
tx
JOO >

REMARK. The inequality of Theorem 2.3.3 is not true if the expectation ofF(x)
is not equal to zero. However, in this case, one can use the estimate

| i - / ( * ) ! < &
2.3. Moment inequalities 91

where, as usual, \ is the first absolute moment. Indeed, again using Lem-
ma 2.1.15, we obtain
oo roo
/ -OO\1 - eitx\dF(x) < J OO \tx\dF(x) = fo\t\.

From Theorem 2.3.2 and inequality (2.1.3) we immediately obtain the fol-
lowing estimate.

THEOREM 2 . 3 . 4 . Let f(t) be the characteristic function of a distribution with


finite variance a2. Then for any a e (0, y/2),

|/(i)| > exp { hi(a)^ 1

for < /.

If the variance is infinite but the first-order moment is finite, then the
following simple lower estimate can be used.

THEOREM 2.3.5. Let X be a random variable with characteristic function fit).


If
= | < oo,
then

SRf(t) > 1 i\t\ (2.3.7)


for all t. Moreover, the difference between the left- and right-hand sides of
(2.3.7) does not decrease as a function of |i|.

PROOF. It clearly suffices to confine our considerations to the case t > 0. Con-
sider the function
v(t) = Xf(t) - (1 - it).
We have v(0) = 0 and
OO fOO
/ -OO sin tx dF(x) >i~ J OO dF(x) = 0,
i.e., v(i) is non-decreasing for t > 0. This implies that v(t) > 0 .

In connection with Theorems 2.3.2 and 2.3.4, the question arises whether
inequalities of the form (2.3.5) and (2.3.7) hold true for moments of any order
between 0 and 2. More exactly, is the following assertion true: for any r e (0,2]
there exists some constant c = c(r) depending only on r such that for any
characteristic function fit) of distribution with a finite absolute moment r of
order r, the inequality |/()| > 1 c(r)r\t\r is true? The question is still open.
92 2. Inequalities

THEOREM 2.3.6. Let F(x) be a distribution function with zero expectation, vari-
ance 2 and characteristic function f(t). Denote the absolute moment ofF{x) of
order a > 0 by :
oo

/ -oo W W * ) .
If
0 2 + S < OO

for some 0 < < 1, then


\f(t)\2 < 1 - <A2 + 2c(5)(ft!+5 + c^s)\t\2+s (2.3.8)
/or aZZ reaZ t, where c(8) is the constant (depending only on ), which is defined
by (2.1.12).

PROOF. Let X and Y be independent and identically distributed random vari-


ables having the distribution function F(x). By virtue of Lemma 2.1.10,

(X-Y)2t2 /Pxl, ,2+S|.|2+S


If(t)\z = cos(t(X -Y))<E

= 1-2 2 + ()||2+|-7|2+

Using Lemma 2.1.7, we obtain (2.3.8).

The values of the constant c(8) for some are given in Table 2.1.

COROLLARY 2.3.2. Let the conditions of Theorem 2.3.6 be satisfied. Then


2t2
|/(f)| < 1 - - + c()(2+s + oZsM2*6. (2.3.9)

To obtain the corollary, it suffices to use the elementary inequality 1 + 2x <
(1 + x)2, which is valid for any real x, and observe that 1 + is non-negative
every time when 1 + 2x is non-negative.
Inequalities (2.3.8) and (2.3.9) are sharp (in the sense that the constant
at ||2+ cannot be made smaller) if we consider a characteristic function on
the whole real line. But if we are interested in its behavior only in some
neighborhood of the origin, then more accurate estimates can be obtained. In
fact, Corollary 2.3.2 implies the following assertion.

THEOREM 2.3.7. Let F(x) be a distribution function with characteristic function


f(t), variance 2 and finite absolute moment 2+sQ, 0 < < 1. Then for any
0 < < <5o and any > 0 there exists > 0 such that
c?t 2
|/()| < 1 - - + ||2+ (2.3.10)

for |i| < A.


2.3. Moment inequalities 93

Table 2.1.

c(5) c(S) c(<5) c(g)


0.01 0.49090 0.51 0.20744 0.26 0.31490 0.76 0.14035
0.02 0.48200 0.52 0.20411 0.27 0.30953 0.77 0.13825
0.03 0.47328 0.53 0.20085 0.28 0.30427 0.78 0.13619
0.04 0.46474 0.54 0.19765 0.29 0.29910 0.79 0.13417
0.05 0.45637 0.55 0.19451 0.3 0.29404 0.8 0.13219
0.06 0.44818 0.56 0.19142 0.31 0.28908 0.81 0.13024
0.07 0.44016 0.57 0.18840 0.32 0.28421 0.82 0.12833
0.08 0.43230 0.58 0.18543 0.33 0.27943 0.83 0.12645
0.09 0.42460 0.59 0.18251 0.34 0.27475 0.84 0.12460
0.1 0.41705 0.6 0.17965 0.35 0.27015 0.85 0.12279
0.11 0.40966 0.61 0.17684 0.36 0.26564 0.86 0.12101
0.12 0.40241 0.62 0.17408 0.37 0.26122 0.87 0.11926
0.13 0.39532 0.63 0.17137 0.38 0.25689 0.88 0.11754
0.14 0.38836 0.64 0.16871 0.39 0.25263 0.89 0.11585
0.15 0.38154 0.65 0.16610 0.4 0.24846 0.9 0.11420
0.16 0.37486 0.66 0.16354 0.41 0.24437 0.91 0.11257
0.17 0.36831 0.67 0.16103 0.42 0.24035 0.92 0.11097
0.18 0.36190 0.68 0.15856 0.43 0.23641 0.93 0.10940
0.19 0.35560 0.69 0.15613 0.44 0.23254 0.94 0.10786
0.2 0.34944 0.7 0.15375 0.45 0.22875 0.95 0.10634
0.21 0.34339 0.71 0.15142 0.46 0.22503 0.96 0.10485
0.22 0.33747 0.72 0.14912 0.47 0.22138 0.97 0.10339
0.23 0.33166 0.73 0.14687 0.48 0.21779 0.98 0.10195
0.24 0.32596 0.74 0.14465 0.49 0.21428 0.99 0.10054
0.25 0.32038 0.75 0.14248 0.5 0.21082 1 0.09916

PROOF. If t h e expectation of F(x) is equal to zero, t h e n t h e assertion of t h e


theorem directly follows f r o m Corollary 2.3.2. Indeed,

cPt2
|/(t)| < 1 - + C(5O)(^2+5o + G28o)\t\2+5
Jifl
= _ . +c(s0m+So + 2**.

and for any , (2.3.10) holds t r u e if we set

=
c(o)(2+So + 2)

Suppose t h a t t h e expectation of F(x) is not equal to zero. Let Z b e a r a n d o m


variable whose distribution function is F(x). T h e r a n d o m v a r i a b l e X a , w h e r e
a = , h a s distribution function F(x + a), zero expectation, variance 2 , a n d
94 2. Inequalities

characteristic function e Uaf(t). Further,

E\X - a|* < E(pr|*> + lal50) = + |a|*

and, by the Minkowski inequality,

Epf |2+ < j032+s<))1/(2+o) + lal]2+<5

Thus, the problem is reduced to the case of zero expectation.

T H E O R E M 2 . 3 . 8 . Let F(x) be a distribution function with zero expectation, char-


acteristic function f(t\ finite variance a2 and the first absolute moment \.
Then

|/(i)|2 < cos ^ - \Jn2 2\\ + 2 2 2 ^ (2.3.11)

for |f| < \2.

P R O O F . A S in the proof of Theorem 2 . 3 . 6 , let X and Y be independent, iden-


tically distributed random variables with distribution function F(x). Mak-
ing use of Lemma 2 . 1 . 1 1 , and taking into account that, due to Lemma 2 . 1 . 8 ,
E\X - I > E\X\ = ft, we obtain for any (0, ],

|/()|2 = E c o s ( i ( X - 7 ) )
sin sin ,. ^
< - cos + - )I - )2

sin sin r 99 ,, 9,
< - cos + [ 2 e r r - 2\\ + ],
2 2 (2.3.12)

Let || < /2. Then


\Jn2 2ni\t\ + 1a2t2 <

(the term under the square root is non-negative because, due to the Lyapunov
inequality, 2 < o2); therefore, we may set

= y/2 -2nl\t\+2o2t2

in (2.3.12) (this value minimizes the right-hand side of (2.3.12)). After that we
obtain
|/|2 < - cos \Jit2 2|| + 2 2 2 ,

which coincides with (2.3.11).


2.4. Unimodal distributions 95

Combining Theorems 2 . 3 . 2 and 2 . 3 . 6 , we obtain an upper bound for the


imaginary part of the characteristic function.

THEOREM 2 . 3 . 9 . Let Fix) be a distribution function with zero expectation and


finite absolute moment of order 2 + , > 0. Then

|3/(f)| < [2c(5)(/32+5 + aZs)]2\t\1+612 (2.3.13)


for all real t, where c(<5) is the constant defined by (2.1.12).

PROOF. From Theorem 2.3.6, we obtain

[tt/(*)]2 + fS/)] 2 < 1 - 2 * 2 + 2c(S)(02+s + 2)||2+5. (2.3.14)


On the other hand, by virtue of Theorem 2.3.2,

[9/()]2 > - j = 1 - <A2 + > 1 - 2*2. (2.3.15)

Bounds (2.3.14) and (2.3.15) obviously yield (2.3.13).

2.4. Estimates for the characteristic functions of


unimodal distributions
The characteristic functions of unimodal distributions are characterized by a
more regular behavior than in the general case. Respectively, more accurate
estimates can be obtained under the condition of unimodality. We start with
inequalities based on the idea of majorization.
Let X be a random variable with distribution function F{x) and concen-
tration function Qx(l) (definition of the concentration function, which is also
denoted as Q(F; I), was given in Section 1.4).

THEOREM 2 . 4 . 1 . Let F(x) be a unimodal distribution function with character-


istic function fit). Then

|/(f)| < Q (F; ^ (2.4.1)

for any t e (oo, oo).

PROOF. It, obviously, suffices to confine our considerations to the case t > 0.
Let t be a fixed arbitrary positive number. We first prove that

(2.4.2)
96 2. Inequalities

Let be a mode of the distribution F (any mode, if it is not unique). We set


, nn
c(i) = - - + y , - 0,1,2,...,

and
fC+l(i)
fC
an(t)= \ cositx) dF(x).
)cnn(t)
Jc I

Let us demonstrate that there exists an integer m = m(t) such that

|*/(f)| * M * ) | (2.4.3)

We shall assume that 9l/(f) > 0 (the case 91/(f) < 0 is treated in a similar
fashion). There exists an integer k = k(t) such that

71 Ttk 71 7lk
+ < U < + .
It t It t

Since the cases a^it) > 0 and *() < 0 are handled in one and the same way, we
shall assume that > 0. For any integer n, an(t) and an+i(t) have opposite
signs and in addition, by virtue of the unimodality of the distribution F(x),
|()| > | + ()| for > k + 1 and | ()| < | + ()| for < k 1. Therefore,
oo
()
* *0
n=k+1

and
k-i
Mi)<0.
n=oo
But
oo
a
W ) = n(t)Z0,

hence Sif(t) < a^(t), which proves (2.4.3).


Furthermore,
fCk+l(t) fCk+l(t) / \
ak(t) = / cos (tx)dF(x) < / dF(x) <Q IF;-).
Jck(t) Jck(t) \ tJ
Since t is arbitrary, inequality (2.4.2) is proved.
It is easy to see that the distribution function Ff,(x) = F(x b) with char-
acteristic function elbtf(t) satisfies the conditions of the theorem for any b.
Hence,
< Q f) = Q f)
2.4. Unimodal distributions 97

Choose b so that 3e i b t f(t) = 0 (for a given fixed t). This is always possible
because the equation a sin + b cos = 0 has roots for any real a and b. Then

Such a b can be chosen for any value oft, hence inequality 2.4.1 holds for any
t.
COROLLARY 2.4.1. Let X be a random variable with symmetric unimodal dis-
tribution, and let f(t) be its characteristic function. Then

for all a > 0.

THEOREM 2.4.2. Let X and Y be random variables with distribution functions


F(x) and G(x) and characteristic functions f(t) and g(t) respectively. Suppose
that the following conditions hold:

(1) the distribution ofX is unimodal;

(2) P(|Y| < c) = 1 for some c > 0;

(3) Qx(l) < P(|Y| < 1/2) for any I > 0.

Then

(2.4.4)

for || < /(2c).

PROOF. It suffices to confine our considerations to the case of positive t. Let us


fix an arbitrary t e [0, /(2)]. Let , cn(t) and an(t), = 0, 1, 2,..., denote the
same as in the proof of Theorem 2.4.1. Then for any t there exists an integer
m such that (2.4.3) holds (see the proof of Theorem 2.4.1. Then

I cos(fcc)| dF(x) = / cos(fct) dF(x + m).


(2.4.5)

We set
98 2. Inequalities

Then (2.4.5) means that

|9t/(i)| < ( - m). (2.4.6)

Observe that, in view of condition 3 of the theorem,

P(|X - m| < ) < P(|Y"| < I) (2.4.7)

for any I > 0; therefore, taking (2.4.6) and (2.4.7) into account and applying
Lemma 2.1.2, we obtain
/21

/
cos(tx)dG(x)
-n/2t
!oo
/ -oo cos (tx)dG(x) = Wgit).
The passage from the resultant inequality to inequality (2.4.4) is accomplished
in exactly the same way as in the proof of Theorem 2.4.1.
COROLLARY 2 . 4 . 2 . Let F(x) be a unimodal distribution function with charac-
teristic function f(t). Then, for any positive b,

|/(f)| <1 ^ - t 2
b2

for \t\ < b.

PROOF. We will use Theorem 2.4.2. As Y, consider the random variable taking
three values /(26), 0, n/(2b) with probabilities

1 - Q(F; /b) ( \ 1 - Q(F; nib)


2 ' Q { F ; b)' 2
respectively. Then

7tt
|/(i)| <Q [ F ; - ] + COS
2b'

Now we make use of inequality (2.1.7) for |f | < b:

' t2'
\m\<Q(F;f) 1 -QlF-l

_ _ 1 Q(F; nib)
b2 f '

2.4. Unimodal distributions 99

THEOREM 2.4.3. Let F(x) be an absolutely continuous unimodal distribution


function with density p(x) and characteristic function f(t). If

supp(x) < a < oo,


X
then

|/(i)| < sin (2.4.8)


2a 2a
for || < and

|/| < ^ (2.4.9)

for all t.

PROOF. Let us prove the first inequality. Let X be a random variable with
the distribution function F(x) and be a random variable with the uniform
distribution on the interval [1/(2a), 1/(2a)]. It is easy to see that the random
variables X and Y satisfy the conditions of Theorem 2.4.2; hence |/(i)| < 9ig(t)
for || < , where g(t) is the characteristic function of Y. Taking into account
that
g(i)=sin'

we arrive at (2.4.8).
Let us prove the second inequality. Without loss of generality we can
assume that t > 0. Let , cn(t) and an(t), = 0,1, 2,..., denote the same as
in the proof of Theorem 2.4.1. Then for any t there exists an integer m such
that (2.4.3) holds (see the proof of Theorem 2.4.1. Therefore,
<wi(i)
|9/()| < |am(f)| = / cos(,tx)p(x)dx
Jcmcm(t)
(t)
Cm+M
<a f I cos(fcc)| dx
Jc,
Cm(t)
mit 2a
=aL
n/2t
cos (tx)dx = .
t

Combining inequality (2.4.8) with inequality (2.1.6), we arrive at the fol-
lowing assertion.

COROLLARY 2.4.3. Let the conditions of Theorem 2.4.3 be satisfied. Then for
any 0 < c < /2,
/ < - 2(C)
24 2
100 2. Inequalities

for || < 2ac, where the function h^x) is defined by (2.1.1).

In particular, setting c = /2, we obtain the following.

COROLLARY 2.4.4. Let the conditions of Theorem 2.4.3 be satisfied. Then

for |f| < .

2.5. Estimates for the characteristic functions of


absolutely continuous distributions
In this section, we deal with the characteristic functions of absolutely contin-
uous distributions. All estimates presented in the section are based on the
assumption that the probability density of the distribution under considera-
tion is bounded. However, the boundedness of probability density alone is not
sufficient to obtain meaningful estimates both for small values of the argument
and for those converging to infinity (see Examples 35 and 36 of Appendix A).
Some additional restrictions are necessary. These restrictions may consist in
sufficiently fast decrease rate of the distribution tail (for obtaining estimates
in a neighborhood of the origin) or in the boundedness of the total variation of
the density function (for estimating the characteristic function at large values
of the argument).
Let us denote by g(x), > 0, a non-negative increasing function such that
g(x) oo as > oo. The inverse function for g(x) will be denoted by g~l(x).

THEOREM 2.5.1. Let X be a random variable with probability density function


p(x) and characteristic function f(t), and be an arbitrary real number such
that 0 < < 1. If
supp(x) < a < oo

and
< oo,
then

(2.5.1)

for |f| < nllc, and

(2.5.2)
2.5. Absolutely continuous distributions 101

for |i| > n!2c, where

PROOF. W e s e t

Sc = P(\X\ c),
. . (p(x)K 1 - Sc), |x| < c,
=
\o, ; > c.

We have
Eg(\X\)
g(c) =

and hence, using the Chebyshov inequality, we obtain

e
(2.5.3)
g(c)
Further, q(x) is a probability density function, whose support belongs to the
interval [c,c], and such that

sup q(x) <


l-<5.
We have
oo rc
/ -oo eitxp(x)dx < IJc
/ eltxp(x)dx

+ f eltxp(x)dx < (1 - 5C) I eltxq{x)dx + <5C.


J\x\>c \Joo

Corollary 2.2.1 yields


oo
/ -oo eltxq(x)dx 3 2 2

for |i| < n/2c. Taking (2.5.3) into account, we finally obtain

" 3 2 2
(1 )3
< 1
3 2 2
102 2. Inequalities

for || < /2.


The proof of inequality (2.5.2) is similar; one should just use the second
inequality of Corollary 2.2.1 to estimate the expression eltxq{x)dx\ for
|i| > tt/2C.

Setting g(x) - xa (a > 0) in Theorem 2.5.1, we obtain the following asser-


tion.

COROLLARY 2.5.1. LetX be a random variable with probability density function


p(x) and characteristic function f(t), and be an arbitrary real number such
that 0 < < 1. Ifsupxp(x) < < oo and = || < oo ( > 0), then

(1 - )3 ,, ( \

and

The set of distributions whose densities are bounded by the same constant
a is closed with respect to translation. This implies that a in Corollary 2.5.1
can be replaced by miny \ \.
One partial case of Corollary 2.5.1 (a = 2, = 1/4) has to be pointed out.

COROLLARY 2.5.2. LetX be a random variable with probability density function


pix), characteristic function f(t), and variance 2. If

supp(x) < a < oo,

then
91
l/WI * 1
64 22
for || < /4 and

for || > /4.

COROLLARY 2.5.3. Let the conditions of Corollary 2.5.2 be satisfied. Then

<254)

for all t.
2.5. Absolutely continuous distributions 103

PROOF. From Corollary 2.5.2 w e obtain

|/ft)| < max j e x p j - J ^ J ,e*p { " J ^ V

for all t. Moreover, the inequality


xy
min{a:,y} >
x+y
holds for any positive numbers andy. Therefore

m m
f{ 64 ' 1024 1J >~ 64 (16
9t2
2 2
9
2 2 2
9
2 2
+ 2 )'

From this inequality and (2.5.5) we arrive at (2.5.4).

In Section 2.7, an inequality will be proved for multivariate characteris-


tic functions (Theorem 2.7.15) whose one-dimensional version is the same as
(2.5.4) (with different coefficients, though). We point out this inequality here
because sometimes (for certain values t and ) it is more accurate than (2.5.4).
So, let the conditions of Corollary 2.5.2 be satisfied; then

<2 5 6,
{ - 2 ^ +2*4

for all t.

THEOREM 2.5.2. Let c be an arbitrary positive number and , 2 ,...bea set of


disjoint intervals such that each of them is of length 2c and
00
(Ja^R1.
i=l
We assume that X is a random variable with density function p(x) and charac-
teristic function fit), and setPk = P(X e *), k = 1,2,... If

supp(x) < o,
X
then

*1 - 3 ^ 2D: * 2 (2 5 7)
3
m
104 2. Inequalities

for |f| < n/2c, and

| (2.5.8)
|/| *
12a 2 c 2
for \t\ > n/2c.

PROOF. W e s e t

x e
D<x) = \PWPJ> A/'
PJ
\0, xAj,j = 1,2,...,
oo

/ -oo eitxpj(x) dx.


It is easy to see t h a t each / ) is a probability density whose support belongs
to an interval of length 2c and such that
a
suppj(x) < .
Pj
Hence, by virtue of Corollary 2.2.1, the inequalities

141 (2.5.9)
3 2 2
' r 1 2c'

\t\ > 2zc? (2.5.10)
12a2c2'
lzcrc"
are true. Further,
oo
l/(f)| = Y] / eitxp(x)dx
J=1

/-oo

JOO
>1
oo
<,/.,|.
>1 >1

Using (2.5.9) and (2.5.10), and taking into account that / l f i , = 1, we finally
arrive at (2.5.7) and (2.5.8).

COROLLARY 2.5.4. Let the conditions of Theorem 2.5.2 be satisfied, then, for
any I > 0, the inequalities

[QxiD?
2.5. Absolutely continuous distributions 105

for || < / and

for || > / are true.

PROOF. There exists an interval of length I such that

P(X e ) = Qxil)

(see (Hengartner & Theodorescu, 1973, Corollary of Theorem 1.1.3)). Consider


an arbitrary partition of the real line into disjoint intervals , 2,... of length
I each, such that = . Then (2.5.7) and (2.5.8) yield

for |i| < nil, and

12aHl/2)2 3a2l2

for |f| > nil.

The estimates in what follows concern the characteristic functions of ab-


solutely continuous distributions whose densities are of bounded variation
(the definition of a function with bounded variation was given in Section 1.4).
These estimates can be considered as natural generalizations of some results
obtained in Section 2.4 for the characteristic functions of unimodal distribu-
tions (notice that the total variation of a unimodal bounded probability density
p(x) is equal to 2 supxpGc)).

THEOREM 2.5.3. Let p(x) be a probability density with bounded variation with
characteristic function fit). Then

|/(i)|<^sin-^- (2.5.11)
t \(p)

for || < ()/2, and

l/ ^ ^ (2.5.12)

for all real t.


106 2. Inequalities

PROOF. Let us prove the first inequality. Since the set of densities with a
given total variation is closed with respect to translation, it suffices, in view of
Lemma 2.1.1, to prove that

|9/()| < sin ( 2 . 5 . 1 3 )


t V(p)

for || < V(p)/2. Let us fix an arbitrary such that || ^ V(p)/2. Without
loss of generality assume that > 0. The cases SRf(to) > 0 and 9/() < 0
should be considered separately. We consider only the first one: it will be seen
that the second case can be treated in a similar way.
So, suppose that *Rf(to) > 0. Denote

f ( + 1)

M n = supp(x),
XtBn
mn = inf p(x),
XBn

In = p(x)dx, = 0,1,2,...
JBn
We have
OO r

/ "OOcos(tox)p(x)dx= n.=Y]
rm J/Bn cos(iojc)p(:!e) dx. (2.5.14)

Let us demonstrate that

/ cos(tox)p(x) dx < / cos(tox)rn(x) dx, (2.5.15)


JBn JBn

where
r ( x ) _ iMn ~ ^ e
[ / / . /to + zn],
n
10 otherwise
for even n, and

( - mn, e [( + 1 )/0 - zn, ( + 1)/0],


rn(x) = <
10 otherwise

for odd , and



1
z n = min .
' 2i 0 Mn mn(l | m ) j
2 . 5 . Absolutely continuous distributions 107

Suppose that is even (the proof for odd is similar). We have

/ cos(io*)p() dx = / cos(ox)(p(x) w] dx. (2.5.16)


JBn JBn
Consider two cases separately:
1 (,
M n - m n \ t0 J 2i0

I f \
(2)
Fn - m
M n \ t0 / > 20

1. In this case,

/ rn(x)dx = I - mn = / [p(x) m j dx,


JBn JBn
and, obviously,
(;) > /?(:*:) - m
for / < : < / + and

/() = 0 < - mn

for /to + zn < < ( + 1)/ Therefore, by virtue of Lemma 2.1.3 (cos(iox)
decreases on the interval Bn),

/ cos(io*)rn(*)<2:x: > / cos(io*)(pOe) mn] dx.


JBn JBn

Taking (2.5.16) into account, we arrive at (2.5.15).


2. In this case, since cos(iox) is negative on the interval ( j ^ + j^,
and positive on the interval ^ + > w e obtain

fnllt+nnlto
/+/
/ cos(ox)(p(x) rnn] dx< cos(io*)[p(*) mn] dx
JBn J /to
//2+/
< / cos(io*)[-Mn mn]dx
J /tn

= / cos(tox)rn(x)dx.
JBn

Again, taking (2.5.16) into account, we arrive at (2.5.15).


Now, let us define the functions pn(x) as follows. If is even, then

^ ^_ rn(x + /to), 0 < x < n!2to,


n
10 otherwise.
108 2. Inequalities

If is odd, then

( ) _ /^ + +
D^o). /2 < x < 0,
n
10 otherwise
It is easy to see t h a t

/ cos(tox)rn(x)dx = / cos {tox)pn(x)dx,


JBn J-oo

or, since cos is an even function,

J cos(tQx)rn(x)dx = J cos(tox) (^Pnix) dx (2.5.17)

Consider the function

It has a bounded support, the interval [/2, /2], and

m
maxqn(x) < ~ ". (2.5.18)
2
I n d e e d , p n ( x ) a n d p n ( x ) have non-intersecting supports, therefore

max(p n (x) + Pn(-x)) = max \ m.axpn(x), maxpn(~x) \ =Mn- mn.


X X X J

In addition, obviously,
oo
/ -oo qn(x)dx < /. (2.5.19)
From (2.5.14), (2.5.15), and (2.5.17) we obtain

91/( 0 ) < [ cos(*o*) ( qnix)) dx. (2.5.20)


J
- Woo /
We introduce

q(x) = ( ) (2.5.21)
\n=oo /

Then, by (2.5.19),
oo
/ -oo q(x)dx < 1,
2.5. Absolutely continuous distributions 109

and, by (2.5.18),
1 oo 1
supq(x) < - (Mn-mn)<- V(p).
2 n=oo

In addition, since eachp0c) vanishes outside the interval [/2, /2], the
support of q(x) belongs to this interval as well. Applying Theorem 2.2.2, we
obtain
11
r* w w * V<P> f
/ cos(tx)q(x) dx < - sin < sin
J-oo t V(p) t V(p)
for all || < 2("2() = ; in particular,

f V(p) i0
/ cos(i0x)g(x)cfcc < sin ( 2 . 5 . 2 2 )
y-oo V(p)
From (2.5.20), (2.5.21), and (2.5.22) we finally obtain (2.5.13).
Now let us prove inequality (2.5.12). First we prove it in the case where
p(x) is differentiate; then
oo 1 roc
/ -OOeitxp(x)dx=- it Joo p(x)de
ltx

= - - eltxdp(x) = ~ / eltxp'(x)dx,
it Joo it Joo
which yields
1 r
\f(t)\<-j_^\p'(x)\dx.

Now it suffices to notice that


oo
/ -oo \p'to)\dx.
Now let us consider the general case where p(x) is not necessarily differen-
t i a t e . Consider the convolution
oo
/ -oo p(x - u)nt(u)du,
where

is the normal density function with zero mean and variance e 2 . The function
pe(x) is differentiate because ne(x) is differentiate hence
< V(p e )
f(t)e-V

110 2. Inequalities

or, taking Lemma 2.1.12 into account,


<V(p)
me-*2"
" 1*1 '
Let > 0; then we finally obtain

<

Estimates (2.5.11) and (2.5.12) are sharp: for an arbitrary > 0 and any
fixed such that || ^ /2, there exists a probability density p{x) such that
V(p) = and
| / ( M = 7" sin ,
to
where /() is the characteristic function corresponding to p(x), and a similar
fact holds true for inequality (2.5.12).

COROLLARY 2.5.5. Let the conditions of Theorem 2.5.3 be satisfied. Then for
any 0 < c < /2,
t2
|/()| < 1 - 2(c)- 2
'6V (p)
for |i| < c V(p), where the function h2(x) is defined by (2.1.1).

This corollary is obtained as a combination of inequalities (2.5.11) and


(2.1.6). Setting c = /2, in particular, we obtain the following assertion.

COROLLARY 2.5.6. Let the conditions of Theorem 2.5.3 be satisfied. Then

for || < ()/2.

Inequality (2.5.12) can be improved if the density function p{x) is one or


several times differentiable. More exactly, the following assertion is true.
THEOREM 2.5.4. Let p(x) be a probability density and f(t) be the corresponding
characteristic function. If p(x) is 1 times differentiable, and p^n~l\x) is a
function with bounded variation, then
|/(f)| < (2.5.23)

for all real t.


2.6. Discrete distributions 111

PROOF. The proof is similar to that of inequality (2.5.12). First, suppose that
( " - 1 ) 0) is differentiable (i.e., p(x) is times differentiable). The procedure
used in the proof of inequality (2.5.12) can be repeated as many times as many
derivatives of p(x) exist. More exactly, if p(x) is times differentiable, and its
first 1 derivatives satisfy the condition

l i m p(k\x) = 0, k = l,2, ...,n - 1,


|i|-oo

then
1 roo 1 1 fOO
fit) = - - eitxp'(x)dx = -2 / p'(x)deitx = / e^dp'ix)
It J-oo ( I t r J-oo (itr J-oo
Vtx)dx >
w Lj - (-) U "
r / \n r

which yields

/ ^ ^ r 1

The passage to the case where p(n~1\x) is not differentiable can be per-
formed in exactly the same way as in the proof of inequality (2.5.12) with the
use of Lemma 2.1.13.

2.6. Estimates for the characteristic functions of


discrete distributions
In this section, we present some inequalities for the characteristic functions
of lattice distributions, i.e., the distributions concentrated on sets of the form
{ a + nh, = 0,1,2,...}, where a and h are some real numbers, h > 0. For
the sake of simplicity, we will consider only the case of a = 0 and h = 1, i.e., the
case of integer-valued distributions. The general case is easily reduced to this
one. Since the characteristic functions of the integer-valued distributions are
periodic with period 2, it suffices to consider these characteristic functions
only on the interval [, ].
There are (at least) two approaches to obtaining estimates for characteris-
tic functions of lattice distributions: the direct method and the method based
on the reduction of the problem to the continuous case. The latter is achieved
by multiplying the characteristic function under consideration by the charac-
teristic function of some (usually uniform) absolutely continuous distribution.
It is the second approach which is applied here.
Let X be an integer valued random variable with distribution function F(x)
and characteristic function f(t). Denote

pk = P{X = k), k = 0,1,2,...


112 2. Inequalities

Then
'oo
/ -oo H(x y) dF(y), (2.6.1)

where
';+1/2, 1 / 2 <x < 1 / 2 ,
H(x) = 0, : < - 1 / 2 ,
1, : >1/2

is the uniform distribution function on the interval [1/2,1/2], is an absolutely


continuous distribution function with the characteristic function

git) = f(t)^ sin (2.6.2)

Moreover, denote the density of G(x) by q(x). Then qix) is a piecewise constant
function taking the valuePk on each interval (1/2 + k, 1/2 + k), k = 0,1,2,...
(at the boundary points = 1/2 + k, qix) can be defined arbitrarily; for definite-
ness and convenience, we define it to be continuous on the right). This implies
that the maximum value of the density q(x) coincides with the maximum of the
numbers pk, and the total variation of the density qix) is given by the relation

V(<7)= V \pk+i-pk\. (2.6.3)


oo '
k=oo

Note also that

Q(G;z) < Qx(z) (2.6.4)

for any z, where Q(G;z) and Qx(z) are the concentration functions of G and
X respectively (see, e.g., (Hengartner & Theodorescu, 1 9 7 3 ) ) . From ( 2 . 6 . 1 ) ,
( 2 . 6 . 2 ) , and Theorem 2 . 2 . 2 , we immediately obtain the following discrete analog
of Theorem 2 . 2 . 2 .

THEOREM 2 . 6 . 1 .Let X be an integer-valued random variable with characteris-


tic function fit). If \X\<m and

max P(X = k) < p,


k

then

lJKJl sin(f/2)
for I < /(2m + 1).
2.6. Discrete distributions 113

Now suppose that Fix) is a discrete unimodal distribution function (for the
definition of discrete unimodality and some basic properties used below, see
Section 1.6). Then G(x) defined by (2.6.1) is a unimodal distribution function,
therefore for its characteristic function g(t) Theorems 2.4.1 and 2.4.3 hold.
Taking (2.6.2) and (2.6.4) into account, we obtain the two theorems below.

THEOREM 2.6.2. Let X be an integer-valued random variable with characteris-


tic function fit). If the distribution ofX is discrete unimodal, then

for < .

COROLLARY 2.6.1. Let the conditions of Theorem 2.6.2 be satisfied. Then

for || < .

THEOREM 2.6.3. Let X be an integer-valued random variable with characteris-


max PCX"
tic function fit). If the distribution = discrete
ofX is k)<p, unimodal, and

then

|J 1 P sin(i/2)

for || < .

Denote the right-hand side of (2.6.3) by V rf (F). Thus, Vd(F) is the discrete
analog of the total variation. Using (2.6.2), (2.6.3), and Theorem 2.5.3, we
arrive at the following assertion.

THEOREM 2.6.4. Let X be an integer-valued random variable with distribution


function Fix) and characteristic function fit). Then

WdiF)SinitlWdiF))
2 sin(i/2)

for |t| < n\diF)!2.


114 2. Inequalities

2.7. Inequalities for multivariate characteristic


functions

In this section, we use the same notations as in Section 1.8, in particular,


[7*) and V* r) are uniform distributions on the m-dimensional sphere of radius
r centered at zero and in this sphere (in the ball), cp(rHt) and y/ r) (t) are the
corresponding characteristic functions (they are given by formulae (1.8.6) and
(1.8.7) respectively). The one-dimensional marginal densities of l f r \ m > 2,
and V*r), m > 1, are denoted by u (r) (x) and ^). They are given by formulae
(1.8.3) and (1.8.5).
We begin with multi-dimensional generalizations of Theorems 1.4.6
and 1.4.8. Recall that a distribution in K m is said to be non-degenerate if it is
not concentrated on a hyperplane of dimension smaller than m. We also say
that a random vector is non-degenerate if its distribution is non-degenerate.

THEOREM 2.7.1. Let fit) be the characteristic function of a non-degenerate


distribution in Rm. Then there exist positive numbers and such that
|/(t)| < 1 - E||t||2 for ||t|| < S.

To prove the theorem, we need the following lemma.

LEMMA 2.7.1. Let fit) be a characteristic function of a non-degenerate distri-


bution. Then there exist a > 0 and a non-degenerate distribution concentrated
in the ball ||t|| < a whose characteristic function git) satisfies the relation

|9t/(t)| < % ( t )

for all < /(2).

PROOF. Let X be a random vector having the characteristic function fit). For
any c > 0, we set

Since the distribution of X is non-degenerate, there exists CQ such that distribu-


tions of all Y c are non-degenerate for c > CQ. Set = Co. Denote the distribution
functions of X and Y a by Fix) and G(x) respectively. For ||t|| < /(2a) we see
2.7. Multivariate characteristic functions 115

that

\*f(t)\ = \[ cos(t,x)dF(x)
I /"1

< I cos (t, x) dF(x) + I cos (t, x) dF(x)


JM* -MMI*

= I c o s ( t , x ) dF(x.) + I cos ( t , x ) dF(x.)


7||||< 7||||>

< [ cos (t, x) dF(x) + P(||X|| > a)


i||x||<a
= f cos(t,x)dG(x) = 9te(x).
Jw"

PROOF OF THEOREM 2 . 7 . 1 . Let X be a random vector whose characteristic func-
tion is f(t). We can assume t h a t / ( t ) is symmetric (real). Indeed, if the theorem
is proved for all symmetric characteristic functions, then, for an arbitrary char-
acteristic function /(t),
|/(t)| 2 < 1 - e o ||t||
2
, ||t|| < 0 ,
for some > 0, > 0, which yields

l/(t)|<i-|l |t|| < .

Thus, by virtue of Lemma 2.7.1, without loss of generality, we can assume that
X is bounded with probability one: P(||X|| < a) = 1 for some a > 0.
Let us demonstrate that there exists 2 > 0 such that
inf Var (e,X) > 2 (2.7.1)
l|e||=l
where the infimum is taken over all unit vectors in R m .
We assume the contrary:
inf Var (e, X) = 0.
I|e||=l
Then there exists a sequence of unit vectors ,2,..., such that
Var (e,X) 0 as oo. (2.7.2)
Since the unit sphere in R m is a compact set, there exists a limit point of
this sequence, say, a vector eo. Without loss of generality we assume that the
sequence ei, 2,... converges to eo:
|e eoll > 0 as > oc.
116 2. Inequalities

We set d = e n eo, and obtain

Var (e, X) = Var (e 0) X) + Var (d, X)


+ 2E (<e 0 ,X) (d,X)) - 2E (e 0 ,X> (d,X)

i.e.,

Var (e 0 ,X) = Var <e n ,X) - Var (d n ,X)


- 2E ((e 0 ,X) (d,X)) + 2E (e 0 ,X> (d,X). (2.7.3)

It is easy to see that, as n>


infty,

I Var (d,X) | = |E (d n ,X) 2 - (E (d,X)) 2 |


< ||d n || 2 E||X|| 2 + ((d,EX)) 2 < ||d n || 2 E||X|| 2 + ||d n || 2 ||EX|| 2
= ||d || 2 (E||X|| 2 + ||EX|| 2 ) 0, (2.7.4)
2
|E((e 0 ,X) (d,X))| < ||e 0 ||||d||E||X|| ^ 0, (2.7.5)
2
|E (,) (d,X) I < ||e 0 ||||d n ||(E||X||) ^ 0. (2.7.6)

From (2.7.2H2.7.6) we obtain

Var(e 0 ,X) = 0 .

But this is possible only if the distribution of X EX is concentrated on a


hyperplane orthogonal to eo, i.e., if X is degenerate. This contradiction proves
(2.7.1). Note t h a t
Var (e, X) = ,
where is t h e covariance matrix of X; therefore, since the covariance matrix of
a non-degenerate random vector is positive definite, (2.7.1) can be also obtained
from the properties of positive definite matrices.
Now let ||t|| < /4a. P u t e = t/||t||. Then, making use of Corollary 2.2.2,
(1.8.1), and (2.7.1), we obtain (recall t h a t fe(t) denotes the characteristic func-
tion of the random variable (e, X))

|/(t)| = |/.(||t||)| < 1 - \ Var (e,X) ||t|| 2 < 1 - - ^ H t y .



2
Now it suffices to set =
and = /4.

Theorem 1.4.8 can be generalized to the multi-dimensional case in several


different ways. The simplest variant is given by the following proposition. We
set
h i=J
Sij
J = l '
\0, i*j.
2.7. Multivariate characteristic functions 117

THEOREM 2 . 7 . 2 . Let F{x) be a distribution function with characteristic function


fit) = fiti, ...,tm). Then for u >
r rj m -mu
/ dF(x) < V / [1 - KfiSyt 1,..., 5 m / m )] df,- (2.7.7)
/llxltei/u
||x||>i/u mu^J
rnu
j-^ Jo

and

I x||>2/u
J _ V /
dF{x) <
mu ^ J-mu
[1- /(V1. <Vm)] dtj.

PROOF. We will prove only the first inequality. The second one can be proved
similarly. Let X = (Xi, ...,Xm) be a random vector with distribution function
Fix). We have

[ dFW = (jx21 + ...+x2m > - ) < p (\Xj\ > ).


J\\x\\>Vu Vv "/ \ rnu J

Taking into account that for each j = 1, ...,m, the characteristic function of
Xj is fiSyti, ...,8mjtm), and applying Theorem 1.4.8 to each summand of the
right-hand side, we arrive at (2.7.7).

Another multivariate analog of Theorem 1.4.8 is given by the following


theorem. Consider the cube

I m = { x = (*!, ...,x m ): - 1 < X! < 1 , . . . , - 1 < xm < 1},

and let = (^, = 1,..., 2 m , be the vertices of F 1 numbered so that

THEOREM 2.7.3. Let Fix.) = Fixi,...,


xm) be a distribution function with char-
acteristic function fit) = f(t\, ...,tm). Then for u > 0

[ dFix\,...,xm) < - y ; [1 - Kfieft,.$>t)\ dt.


Jut-11**1^" U Jo

PROOF. L e t i , denote the '(l/2 m )th spaces' ofR m labeled by (1) , ...,e ( 2 m \
respectively, i.e., ^ e Ej,j = 1, ...,2 m . This means that if = (x\, ...,xm) e Ej,
j = 1 2 m _ 1 , then efxk = k = 1, ...,m, and if = (, e EJ+2m-i,
j = 1 2 m - \ then efxk = -\xk\,k = 1, ...,m.
Denote Cj = Ej u Ej+2m-i,

Au = j x = (xi, ...,* m ):
118 2. Inequalities

and
s(x) = 5 > * i
k=l
Then

dt
=1 u Jo L

+ $xm)) dt dF(x)

- V f [l - s i n ( "(l)xl + + mxrn))
dF(x)

2m 1
^ J ^ _ sin(ttS(x))j
dF(x)
" ^ JcjrAyu 1 ~ uS(x) J

4 / = dF(x),

because 1 sin jc/ac > 1/7 for > 1.


One more multivariate analog of Theorem 1.4.8 is given by the following
theorem.

THEOREM 2.7.4. LetF{x), be a distribution function with characteristic


function f(t). Then for u > 0

f diXx) < ^ f [1 - 9t/(t)]dt (2.7.8)

/or any a > m/y/, where

r ( f + 1)
c(a) =
rrm/2 1 - - J 2 -

PROOF. First let us prove that
m
|y/r)(t)| < (2.7.9)
V-Iltll

for all t e P (here y ( r ) (t) is the characteristic function of the uniform


distribution in the m-dimensional ball of radius r centered at zero). From
2.7. Multivariate characteristic functions 119

(1.8.5) and Lemma 2.1.6 we obtain

() () ((/ + 2)12) m
sup vy '(x) = v ' = = <
r^/nT{{m + l)/2) 2 ry/'

Let y/(f\t) be the characteristic function of v(r\x). Then, using Theorem 2.4.3
(i/ r) (x) is unimodal), we obtain
m
^r\t\

for all t e R 1 . But i/ r ) (t) = ^ r ) (||t||); (2.7.9) is thus proved.


Let
jf1'2um
l(u) = -
T(m/2 + 1)
(m-dimensional volume of the ball of radius u). Using (2.7.9), we obtain

~ f [1 - 9t/( t)] d t = ~ [l - f cos (t, x) dF(x) dt

= 1 dF(x)
~ l h i \[ s(t,x)dt
l(u) [y||t||<u
= 1 - f m \f cos (t, x) dF(x)
Ju L-Zr"1
= 1 - [ m / " ' ( x W x ) = f m [1 - i / u ) ( x ) W ( x )
JR JR

[ [1 - yriu)<(x)]<F(x)> 1 -
m
y/.
/ dF(x),

i.e., (2.7.8) is true.


Many results of Sections 2.2-2.7 can be quite easily extended to the multi-
dimensional case.

THEOREM 2.7.5. Let X be an m-dimensional random vector with probability


density p(x) and characteristic function f(t). If ||X|| < c and

sup p(x) < a,


xeRm

then


(2.7.10)
120 2. Inequalities

and

w e i ^ - s ^ W>5 71"

where
n(m~Wcm-l

a = ((m + l)/2)

PROOF. Let X = (, ...,Xm). Without loss of generality we can assume that t


is of the form t = (t, 0,..., 0), where t = ||t||, i.e., f(t) = fi(t), where /i() is the
characteristic function of X\ (the general case can be reduced to the considered
one by rotation). Denote the probability density of by piGe); then

pi(x)= p(x,x2,...,xm)dx2...dxm
J Rm-l
= / p(.x,X2,...,xm)dx2-.-dxn
Jxi+...+xZ,<c2

<a dx2...dxm = 0 = _ = .
Jxl+...+xl<c2 r((m + l)/2)

Applying Theorem 2.2.2 (we, obviously, have |Xi| < c), we arrive at (2.7.10) and
(2.7.11).

COROLLARY 2.7.1. Let conditions of the theorem be satisfied. Then

lltll2

and
s i-isp * >

THEOREM 2.7.6. Let Xbe a bounded m-dimensional random vector, ||X|| < c,
with characteristic function f(t) and covariance matrix .
For any a e [0, /4]
tt'
|/(t)| < 1 - h2(a)

for ||t|| < ale. (the functions h\(a) and /12(a) are defined by (2.1.1),).

PROOF. Without loss of generality we can assume that EX = 0. Let a [0, /4].
Fix an arbitrary t such that ||t|| < ale and set to = t/||t||. Consider the random
variable (to,X). Denote its characteristic function by (pit). Then

fit) = E e ^ = Ee'lWK*0-^ = <p(||t||).


2.7. Multivariate characteristic functions 121

We have
I (to,X) I < ||X|| < c,

so, applying Theorem 2.2.3 to the random variable (to,X) and taking into
account that
Var (t 0 ,X) = t0Xt0 = TL^',

we obtain

|/(t)| = I<p(||t||)| < 1 - 2 ( ) ^ | | | | 2 = 1 - 2()^.


In what follows, when we speak about the covariance matrix of a distribu-
tion, we assume that the corresponding second-order moments are finite.
Theorems 2.7.7-2.7.11 below, which are multi-dimensional generalizations
of Corollary 2.3.1, Theorems 2.3.2-2.3.4, and Corollary 2.3.2 can be obtained
in exactly the same way as Theorem 2.7.6.

THEOREM 2.7.7. Let X be an m-dimensional random vector with characteristic


function /(t) and covariance matrix . //"X has a symmetric distribution and
EllXll4 < oo, then

ffitf /x , tzt' EX4 m i i 4


< f t) < 1 + 11 "t 4
2 J 2 24 11 11

for all t e

THEOREM 2.7.8. Let F ( x ) be an m-dimensional distribution function with char-


acteristic function f(t) and covariance matrix . Then

tit'
|/(t)| > 1 -

for all t e

THEOREM 2.7.9. Let X be an m-dimensional random vector with characteristic


function f(t). If i = E||X|| < oo, then

9l/(t) > 1 i||t||

for all t e R m .

To obtain a multi-dimensional analog of Theorem 2.3.4, we need the fol-


lowing trivial generalization of inequality (2.1.3).
122 2. Inequalities

LEMMA 2.7.2. Let be anmxm non-negative definite matrix and c e (0, y/2).
Then

exp < 1- ^ (2.7.12)

for all e Rm such that ' < c2.

Inequality (2.7.12) immediately follows from inequality (2.1.3) if we set


t = \/'.
Making use of Lemma 2.7.2, we obtain (in the same way as Theorem 2.7.6)
the following generalization of Theorem 2.3.4.

THEOREM 2.7.10. Let F(TL) be an m-dimensional distribution function with


characteristic function f(t) and covariance matrix . Then for any a g (0, V2),

f txt'l

|/(t)| exp I - ! ( a ) j

for all t e Mm satisfying the relation tEt' < a2.


THEOREM 2.7.11. Let ~K.be a random vector with zero expectation and charac-
teristic function f(t). If E||X|| 2+<5 < oo

for some 0 < < 1, then

f\f>
|/(t)| < 1 - + C(5)(E||X|| 2+(5 + E||X||2E||X||5)||T||2+5
2
for all t e Rm, where c(5) is defined by (2.1.12).

A multi-dimensional analog of Theorem 2.3.7 can be derived in exactly the


same way as in the one-dimensional case. We give only its formulation.

THEOREM 2.7.12. Let X be a random vector with characteristic function f(t)


and covariance matrix . If
|||| 2+<5 < oo

for some 0 < < 1, then for any 0 < < and any > 0 there exists > 0
such that
\ M \ < I - ^ + e\\T\RS

for ||t|| < .


2.7. Multivariate characteristic functions 123

Now we give some estimates for the characteristic functions of the ab-
solutely continuous multivariate distributions making use of estimates for
characteristic functions of distributions with bounded support. The proofs are
very similar to those in the one-dimensional case.
As in Section 2.5, letgO*:), > 0, be a non-negative increasing function such
thatgCr) > oo as oo, whose inverse function is denoted byg - 1 0e).

THEOREM 2.7.13. Let X be an m-dimensional random vector with probability


density function p(x) and characteristic function fit), and be an arbitrary
real number such that 0 < < 1. If
sup p(x) < a < oo
xeRm
and
^(||X||) < oo,
then

|/(t)|<l-^^||t||2 (2.7.13)

for ||t|| < nllc, and

(2 7 14)
i ^ - w

for ||t|| > n/2c, where


,C=g - ^ (

and
jr(m-l)/2cm-l

Oo
0 = , :ra. (2.7.15)
r ( ( m + l)/2)

The proof is similar to the proof of Theorem 2.5.1: we set


= P(||X|| > c)
and
JpM/( 1 - 4 ) , llxil i c ,
,<x,=
lo W > ,
Then, exactly in the same way as in the proof of Theorem 2.5.1, we obtain

|/(t)| < (1 - 4 ) I / el(t'x)<?(x)d: + <5C,


IJw
and taking into account that <5C < and making use of Corollary 2.7.1, we
finally arrive at (2.7.13) and (2.7.14).
124 2. Inequalities

COROLLARY 2 . 7 . 2 . Let X be an m-dimensional random vector with probability


density function p(x) and characteristic function f(t), and be an arbitrary
real number such that 0 < <5 < 1. Assume that

sup p(x) < < oo


xeRm

and
< oo, > 0,
and denote
ya = m imn E | | X - b | | a
beR
Then
2
(1 - 5)352(m-l)/a ^ ( W ^ j
2
l/(t)| 1
- 3rt m + l y 2(m-l)/a a 2 ^

for ||t|| < {/)1, and

(i - )32/ [r
l/(t)| 12nm-lY2m/aa2

for ||t|| > (/).

The inequalities given by Theorem 2 . 7 . 1 3 and its corollary are isotropic


their right-hand sides depend only on the length of the vector t. On the one
hand, this makes them quite simple. But on the other hand, sometimes these
estimates are rough: an estimate of this kind does not take into account that
a distribution can be elongated in some directions and compressed in others.
It gives the same bound (which obviously must be the worst) in all directions.
To obtain more accurate estimates, we can use individual truncations of
a random vector in each direction instead of a common truncation for all
directions (which was used in Theorem 2 . 7 . 1 3 ) . This approach is used in the
following two estimates.

THEOREM 2 . 7 . 1 4 . Let X = (XI,...,Xm) be an m-dimensional random vector with


probability density p(x), characteristic function fit), and covariance matrix .
If
sup p(x) < a,
xeR m
then

l/(t)|
- 1
-
9 [ {)[ (2 7 16)

2.7. Multivariate characteristic functions 125

for VfS! < /4, and

9 [()] 2 1 2
(2.7.17)
|/(t)| ~ 1 2" +12 " - 1 2 ( 2 )" - 1 ( / )

for v W > /4, where 2 = E||X||2.

PROOF. Proof. The proof, in general, repeats the arguments of the proof of
Theorem 2.5.1. Without loss of generality, we assume that EX = 0. Let us
fix an arbitrary t. For the sake of simplicity, suppose that t is of the form
t = (t, 0,..., 0). Let c be a positive constant (it will be chosen later). We set

8C = P(X2 + ... +X2m > c 2 ),

B = {xe + . . . < c 2 } ,

0 otherwise.

Denote the characteristic function corresponding to g(x) byg(t). Then (see the
proof of Theorem 2.5.1)

|/(t)| < (1 - <5c)|g(t)| + 5C = (1 - <5c)|gi(i)| + <5( (2.7.18)

where t = ||t||. The marginal density giGc) corresponding to giit) is majorized


by (see the proof of Theorem 2.7.5)

Therefore, by virtue of Corollary 2.5.2,

912
kiwi ^ - (1 - <5C)2
^ 1 ^ - 6 4 * - - *

if V m ! < /4, and

(1 - 6C)2

if Vffit 7 > /4. Taking (2.7.18) into account, we obtain

l/(t)| < 1 - (1 - <5C)3



126 2. Inequalities

for Vmf < /4, and

l/<t)|S1 ( 1 4 ) 3
- 1024(W"-" "

for ^/ffit7 > /4. Set c 2 = 2 2 . Then

c2 " c2 2'
and we finally arrive at (2.7.16) and (2.7.17).

COROLLARY 2.7.3. Let the conditions of Theorem 2.7.14 be satisfied. Then

9|r(^)l2||t||2
|/(t)| < exp

The corollary can be obtained in exactly the same way as Corollary 2.5.3.
An estimate similar to the estimate of the corollary is given by the following
theorem.

THEOREM 2.7.15. Let X = (.,...,X m ) be an m-dimensional random vector with


probability density p(x), characteristic function fit), and covariance matrix .
If
supm
p(x) < a,
xeR
then
f 2 tSt'((m l)!!)
1
2
1
f(t)
1 1
< exp < 2
^ 1
- = =2 - }
\ 27 ||(8)' 7" - (2 + y/m\/m/) J
where || is the determinant of the matrix .

PROOF. Without loss of generality we assume that X has zero expectation. For
t = 0 the assertion of the theorem is obvious, so we suppose that t 0. We
have
kl - |/(2Trt)|2) = f sin 2 ( (t, x))p(x)dx.
Jw
Let and r be positive numbers ( < 1). Divide the space R m into three disjoint
sets:

= { : sin2(rc (t,x)) > },


A2 = { e Mm: sin2(rc (t,x)) < , - 1 ' < r2},
A3 = {x e R m : sin2(rc (t,x)) < , _ 1 ' > r2}.
2.7. Multivariate characteristic functions 127

The values of and r will be chosen later.


We have

/ sin 2 (jr(t,x))p(x)dx > / p(x) rfx,


./Ai

/ 2( < t , x >)p(x)dx < / p(x) cix

< \ F (-^^
< yRm

J
~
i=l >1

where is the minor of corresponding to the element E X ^ .


Thus,

|(1 - |/(2Trt)|2) > Ia p ( x ) d x > ( l JA (2.7.19)

Let us estimate the integral

I = p(x)dx
JA2

and choose the parameters and r. For any real x, we denote its distance from
the nearest integer by [*]o, i.e.,

Mo = min \x 1 k\.
A=0,l,2,... 1

Since
|sin(* (t,x))| > 2[(t,x)]o,
the relations

A2 C { x : 2[(t,x)] 0 < yfc, - 1 ' < r 2 }

c | x : I ( t , x ) -k\< k = 0,1,2,..., ^ ' ^ 2 !

are true.
The matrix - 1 , which is inverse to a non-degenerate covariance matrix
, can be represented in the form of the product of two matrices and ':
- 1 = '. Taking into account that p(x) < a and changing the variables
y = , we obtain

1< [ dy = Lk (2.7.20)
|A| JBK
128 2. Inequalities

where

Bk = j y : I ( t y A - 1 ) -k\<^-,k = 0, 1, 2,||y|| 2 < r 2 j .

Consider the case k = 0. In the integral in (2.7.20), we rotate the coordinate


system in such a way that the hyperplane (y, t(A') - 1 ) = \fe!2 becomes parallel
to some coordinate hyperplane in the new coordinate system (xi,x2 xm) We
obtain

= T77 / Xj\< /- dxj dxi...dxj-idxj+...


1 1 J\^2||A')- H ^ J 7||11||1<
1 Jl 1 1
s
= ao J
f2||t(A')-l||,(r2 J2yn-1
Xj) axj,

where
a 2{2nfm~1)l2
ao =
|| (m 1)!!
for even m and
2(2/2
ao =
|A| n(m - 1)!!
for odd m. It is easy to see that

y/2i(2 n r ^ '
L -]Af||t(A')-M|(m-l)!! (2,7'21)

for all m.
By the definition of m we have

/
\m\ < + I < t,x >| + 1.

Since 0 < < 1 and - 1 ' < r2, we obtain

I (t, ') I < \J(tZt/)(xI_1x/) < rVfflt 7 (2.7.22)

and
\m\ < rVffit 7 + 2.
Making use of (2.7.19H2.7.22) and taking into account that
2.7. Multivariate characteristic functions 129

and
tst' < HtCAO-^lllA-Vll,
we come to the conclusion that

" ^ "--V
yfiim -1)!! V Vm/J
<272
Besides,
/2|(2/2/2 ^ 1 2m
y/nm\\ ~ r2 '

so, setting r2 = 4m, in (2.7.23) we can choose

r- _ / _ 4m \ (3 /2^(2 2 ) ( " ~ 1)/2 (2 + 2/(tt'))\


V r2 J y y/n(m -1)!! J

Under this choice of and r, we obtain the assertion of the theorem.

To conclude this section, we present some inequalities concerning spheri-


cally symmetric distributions. The definition and basic properties of spherical-
ly symmetric distributions were given in Section 1.8. Without loss of generality,
we will consider only distributions which are spherically symmetric about the
origin. First let us establish the following fact.

LEMMA 2.7.3. Let X = CXi Xm) and Y = (Yi Ym) be two spherically sym-
metric random vectors. If

P(||X||<r)>P(||Y||<r) (2.7.24)

for any r> 0, then

PdXil <)>(|| <) (2.7.25)

for any a > 0.

PROOF. Let F and G be distributions of X and Y respectively. By virtue of


(1.8.2), there exist univariate distribution functions Hpix) and HQ{X) (each
concentrated on the non-negative half-line, i.e.,iijr(00) = 0 and#<3(00) = 0)
such that
/oo
F(A) = / Tfr\A)dHF{r) (2.7.26)
J0

for any Borel set A c R m , and


roo
G(A)= / t/ r ) (A)di/ G (r) (2.7.27)
Jo
130 2. Inequalities

where jjir)
is the uniform distribution on the surface of the m-dimensional
sphere of radius r with center at zero.
Suppose that (2.7.24) holds. Denote
Sr = { x Rm : ||x|| < r}.

Then
C/ r) (S a ) = 0
if r > a, and
t/^GSa) = 1
if r < a; therefore (2.7.26) and (2.7.27) yield
poo
P(||X| <a) = F(Sa) = / lr\Sa) dHp(r) = Hp(a)
Jo

and
P(||Y|| <a) = HG(a).

Hence (2.7.24) means that

HF(a) > Hc(a) (2.7.28)

for all a > 0. Denote

Ea = { = (,...,xm) e Rm: < a, - o o < xj < oo, j = 2, ...,m}.

It is easy to see that


/\) > lP\Ea)

when 0 < < y, i.e., the function (p(r) = lf-r)(Ea) decreases (for any fixed a).
Therefore from (2.7.26) and (2.7.27), taking (2.7.28) into account and making
use of Lemma 2.1.2, we obtain
/oo
P(|*i| < a) = F(Ea) = / Zf r)(Ea)dHF(r)
Jo
roo roo
= / q>(r)dHF{r)> / (p{r)dHG{r)
Jo Jo
poo
= / l/^EJdHoir) = G(Ea) = (|^| < a),
Jo

i.e., (2.7.25) is true.

By the way, the assertion converse to Lemma 2.7.3 also holds, i.e., (2.7.25)
implies (2.7.24). Of course, relation (2.7.25) in the formulation of the lemma
can be replaced with the relation

P(| (X, e) I < ) > P(| (Y, e) | < a)


2.7. Multivariate characteristic functions 131

where e is an arbitrary unit vector in R m .


Let > 3. Then, according to Theorem 1.8.19, one-dimensional projections
of any m-dimensional spherically symmetric distribution are unimodal. Thus,
to obtain estimates for the characteristic functions of spherically symmetric
distributions, we can use the results of Section 2.4. The following two theorems
are based on this idea.

THEOREM 2.7.16. Let X and Y be spherically symmetric random vectors in


m > 3, with characteristic functions f(t) and g{t) respectively. If Y is bounded,

and
P(||X||<r)<P(||Y||<r)

for all r> 0, then


|/(t)| <g{t)
for |t| < tt/(2c).

PROOF. Due to the symmetry, it suffices to prove that

|/ e (i)| < e (i), |f| < 7T/(2C) (2.7.29)

for some unit vector e R m . Consider the unit vector e = (1,0, ...,0) and by
X\ and Y\ denote the random variables (X, e) and (Y, e) whose characteristic
functions are fe(t) and ge(t) (see Section 1.8). From Lemma 2.7.3 it follows
that
(||<)<(||<)

for any a > 0. In addition, by virtue of Theorem 1.8.19, X\ has unimodal


distribution, and obviously, |Yi| < c. Therefore, making use of Theorem 2.4.2,
we arrive at (2.7.29).

Let us consider a spherically symmetric random vector X. Assume t h a t t h e


probability P(||X|| < r) is known or can be estimated above for some r > 0:

P(||X||<r)<p, 0 < < 1.

Then
(||||<)<(||||<)

for all > 0, where Y is a random vector having spherically symmetric distri-
bution of the form
G=pE0 + a-p)Uir)
132 2. Inequalities

(E0 is the distribution degenerate at zero and is the uniform distribution


on the surface of the m-dimensional sphere of radius r centered at zero). The
characteristic function of Y is (in view of (1.8.6))

\m/2_1
(77l\ / 2

2J ( j Wi(r||t||).

On the other hand, the random vectors X and Y obviously satisfy the conditions
of Theorem 2 . 7 . 1 6 . We thus come to the following assertion.

COROLLARY 2 . 7 . 4 . Let X fee a spherically symmetric random vector with char-


acteristic function f(t). If

P(||X|| <r)<p, 0 <p <1,

for some r > 0, then

1
|/(t)|</> + (l-p)r(|) Jmn-M^W)

for |t| < nl(2r).

The estimate given by the corollary is, obviously, sharp. Another conse-
quence of Theorem 2 . 7 . 1 6 is the following.

THEOREM 2 . 7 . 1 7 . Let Xbea spherically symmetric random vector with bound-


ed density function p(x) and characteristic function f(t). If

sup p(x) < a,


xeRm

then
ml2
1/(4,1 s r +1 I)
(f )(^Bi)
for ||t|| < n/(2ra), where ra is the radius of the ball of volume 1 la:

_ J _ /r(w/2 + 1)\ Vm
a
y/ V a )

This estimate immediately follows from Theorem 2 . 7 . 1 6 if as Y we take a


random vector uniformly distributed in the ball of volume 1 la with center at
zero.
2.8. Integrals of characteristic functions 133

2.8. Inequalities involving integrals of


characteristic functions
In this section, we present a smoothing inequality of a general form and some
of its consequences as well as some inequalities connecting characteristic and
concentration functions.
In what follows, the symbol v.p. stands for the Cauchy Principal Value of
an integral, i. e.
V'P' L ~ feo (/T + J )'
T h
THEOREM 2.8.1. Let K(t) be a complex-valued function satisfying the conditions

(i) the function 91K(t) is even, and the function 3K(t) is odd;

(ii) 'RK(t) is integrable in the interval (1,1);


(iii) there exists a constant such that the function 2>K(t) /t is integrable
in (1,1);

(iv) for any x,


1 1
v.p. / e ' ^ K i M t Z E i x ) -
J-i 2
where E(x) is the degenerate (at zero) distribution function: E(x) = 0 for
< 0 and E(x) = 1 for > 0.

Then for any distribution function F(x) with characteristic function f{t), the
inequalities

Fix + 0) < i + v.p. JTTe'itx^K f(t)dt (2.8.1)

and

Fix - 0) > - v.p. jTTe~itx^K (-^j f(t)dt (2.8.2)

are true.

PROOF. It is easy to see that we only need to prove the first inequality: the
second one is its consequence. Indeed, the distribution function corresponding
to the characteristic function g(t) = f(t) is G(x) = 1 F(x 0); therefore, in
view of (2.8.1),

G(-x + 0) < i + v.p. Teitx^K (!) f(-t)dt


134 2. Inequalities

which is obviously equivalent to (2.8.2).


Taking conditions (i)(iii) of the theorem into account, we obtain

v.p. i T T e - i i x K ( t ) f M t

K[^\e-itxm + K(-^)eitxf(-t) dt
h^OJh

-)-* *(e~ltxf(t))

Z{e-itxf{t))}dt

"ff M * ) (jy^-^Fiy^dt

= y | jlim J * cos[(y - *)*] - 3if sin[(y - x)i] dF^y)

Inequality (2.8.1) is hence equivalent to the inequality

dF(y). (2.8.3)

Furthermore,
1 f 1
FOt + 0) dF(y). (2.8.4)

Relations (2.8.3) and (2.8.4) imply that (2.8.3) is true for every distribution
function F(x) if and only if

v.p j y < < y - ^ K ^ d t > E ( x - y ) -

which is obviously equivalent to condition (iv) of the theorem.

The methods to construct functions K(t) satisfying conditions (iMiv) of the


theorem, as well as other useful recommendations, can be found in (Prawitz,
1972). Here we just make some general remarks. First of all, we point out the
following trivial but useful property of functions satisfying conditions (iMiv).
2.8. Integrals of characteristic functions 135

PROPOSITION 2 . 8 . 1 . LetK^^it) andKS2){t) be two functions satisfying conditions


( i H i v ) . Then any function of the form

K{t) = a ^ i t ) + (1 - a)K^2\t),

where 0 < a < 1, satisfies conditions ( i M i v ) as well.

Further, let K(t) be a function satisfying conditions (iMiv). Denote its real
and imaginary parts by K\{t) and K2(t) respectively. Then condition (iv) can be
written in the form

2 I [cos(.xt)Ki(t) + s i n { x t ) K 2 { t ) \ dt > E(x) -


Jo
Introduce the functions

Hl(x) = 2 [ Ki(t)cos(xt)dt,
Jo
H2(x) = 2 / K2(t) sin(jci) dt.
Jo
Conditions (iMiv) can be expressed in terms of functions H\(x) and H2{x) (or
their analytic continuations). For instance, condition (iv) is equivalent to the
condition
()+ 2 (*) > ( * ) -

or, since H\(x) is an even and H2(x) is an odd function, to the condition

i(x) > H2(X) - ^SGNX

Thus, the construction of a function K(x) satisfying conditions (iMiv) (or ex-
amination whether a given function satisfies these conditions or not), can be
fulfilled via constructing (examining) functions H\{x) and H2(x). Sometimes
the latter proves to be easier. Below we give the formulation of a theorem
realizing this idea. Its proof is contained in (Prawitz, 1972).
The functions \() and/^Ot) under consideration can always be continued
as entire analytic functions. Let J^f be the class of those even entire analytic
functions H{z), = + iy which satisfy the conditions

(a) H(z) is real and non-negative on the real axis;

(b) H(z) > 1/2 on the imaginary axis;

(c) there exist positive constants c and h such that \H(z)\ > c for < h;
136 2. Inequalities

(d) if = + iy, then

lim \Hiz)\ log(2 + = 0;


|z|-oo

(e) H(z) is integrable on the real axis iy = 0).

THEOREM 2.8.2. Let Hiz) e Jif and set

* du,.
J-u 2
2ida, J-iociw -x2)H(w)

Then the function

i 1
Kit) = + / [H(x) cos itx) + iG(x) sin(te)] dx
2nt 2 -oo

satisfies conditions (iMiv) and, hence, the inequalities

Fix + 0) < i + v.p. () f(t)dt

and
F(x _ 0) > i - V.p. f ^ e - ^ K ( - i ) /()

are true for every distribution function Fix) with the characteristic function fit).

Among the functions Kit) satisfying conditions (iMiv), of great interest is


the function

*0) ( ) = (<1 - + i [d - *(*) + !f], |f| < 1, ( 2 8 5 )

[0 otherwise.

The fact that satisfies (i)-(iv) can be verified directly or on the basis of
Theorem 2.8.2 (for details, see (Prawitz, 1972)).
Using Theorem 2.8.1 with Kit) = ^0)(i), one obtains a useful addition to
the inversion formula in the form of Theorem 1.2.5. Before formulating it, let
us establish the following estimate of the function K^\t). Denote the real and
imaginary parts of <0)(f) by <0)() and K^\t) respectively.

LEMMA 2.8.1. For any real t, the inequality

|1 - 2ittK$\t)\ <

is true.
2.8. Integrals of characteristic functions 137

PROOF. Without loss of generality we may assume that 0 < t < 1. We have

|1 - 2ntK%\t)\ = |1 - - i)cot(?rf) - t\
itt sin(jtf)
= (1-0- cos(Trf)
sin(Trt) TCt
Thus, the lemma will be proved if we prove the inequalities

sin(Ttf)
cos(jtf) 0<t<l, (2.8.6)
itt
sin(Ttf)
> 1 - t, 0 < t < 1. (2.8.7)
Jtt
Let us prove (2.8.6). First of all, we see that, by inequality (2.1.9),
sin(Ttf) sin(Trt)
- cos(Ttf) - cos(Trt), 0 < t < 1. (2.8.8)
jtt 7tt
Consider the function

v() = t sin(jrf) + ().

If we show that it is non-negative for 0 < t < 1, then, in view of (2.8.8), we


arrive at (2.8.6). We have v(0) = 0 and

v'(t) = n?t-ii?t sin(jrf) > 0,

i.e., v(i) > 0, t > 0. Thus, (2.8.6) is proved.


Let us turn to the proof of (2.8.7). Denote
sin(Trf)
) = (1-0, 0 < t < 1.

We have (0) = (1) = 0; therefore, (2.8.7) will be proved if we demonstrate
that () is concave in the interval (0,1), i.e., that "() < 0 in this interval.
We have

{ t ) = 7i?t2 () 2 cos(Ttf) + 2 sin(nf)


'

so, it suffices to prove that

- V ) - 2 cos(Trt) + 2 sin(7rf) <0, 0 < t < 1,

that is,

V 2 } Jtt
138 2. Inequalities

The last inequality follows from inequality (2.1.8) and the inequality
()
< 1.
7lt

THEOREM 2.8.3. LetF(x) be a distribution function with characteristic function
f(t). Then
Fix) = 1 + J- v.p. / +R
2 2 J-T t
for any > 0, where the remainder R = Rif, T) satisfies the inequality
rT

\ R ^ V r U m d t

PROOF. Denote

Then from Theorem 2.8.1 we obtain

I - J < Fix - 0) < Fix) < Fix + 0) < I + J ,

that is,

Fix) =I + R0 (2.8.9)

where

\R0\ZJ. (2.8.10)

Denote

2 2
Then, by virtue of (2.8.9),

Fix) =l + 7 + R +Ri>

J t
therefore, to prove the theorem it suffices
3 1 torT demonstrate that
J _ T \ m \ d t
2.8. Integrals of characteristic functions 139

or, due to (2.8.10), that


rT
+ lff_Wt)\<" 811'
-T
We, obviously, have
rT

f_J
-T
mdt (2812)

Further, using Lemma 2.8.1, we obtain

whLH^if)-1
dt. (2.8.13)

Bounds ( 2 . 8 . 1 2 ) and ( 2 . 8 . 1 3 ) yield (2.8.11).

Now we establish some relations between characteristic and concentration


functions. These inequalities are, in some sense, converse to the truncation
inequalities given by Theorem 1.4.8.

THEOREM 2 . 8 . 4 . Let X be a random variable with characteristic function f(t)


and let h{t) be a real characteristic function satisfying the condition h(t) = 0 for
|f| > 1. Denote by p(x) the probability density function corresponding to h(t).
Then for any >0, a > 0 and b > 0,

QxU) ^ ^ f \f(t)h(t)\ dt (2.8.14)

where

= max

c = min p(x).
<Kx<6/2

PROOF. Denote the distribution function of X by F(x). Taking into account


that h(t) is absolutely integrable and using the inversion theorem for densities
(Theorem 1.2.6), we obtain
oo ra f oo

/
p(a(x )) dF(x) = - / eiyuh{ula) / e~iuxdF(x) du
oo 2 J-a U-oo
< \f(t)h{t/a)\dt (2.8.15)
2 J
140 2. Inequalities

for any real and positive a. If 0 < <b, then

min p(ax) > min p(x) = min p(x) = c


/2<</2 b/2<x<b/2 0<x<b/2

(because p(x) is symmetric). We set

I = { X : y - X - < x < Y + * - Y

Then
oo

/ -oo p(a(x - y))dF(x)

and hence, in view of (2.8.15),

P ( X e /) < - \f(tMt/a)\dt.
2nac J-a

Since is arbitrary, we obtain

QxW < \f(t)h(t/a)\dt (2.8.16)


2 nac J-a

for any positive a and satisfying the condition <b. Thus (2.8.14) is proved
for such a and .
Setting a = b/ in (2.8.16), we come to the conclusion that

_
Q x W ^ A r f \f(t)h(t/a)\dt
2nbc J\t\<ba
i\t\<ba

for any positive and b. This implies that if >b, then

Qxtt) < \f(t)h(t/a)\dt.


Znbc J-a

Inequality (2.8.14) is thus proved for any positive a, b and . The concentration
function ?() is non-decreasing for > 0; therefore (2.8.16) and (2.8.14) are
true for = 0 as well.

C O R O L L A R Y 2 . 8 . 1 . Let X be a random variable with characteristic function f(t).


Then

er) s ( ) 2 (, 1 ) m (i - e.8.17)

for any > 0 and a > 0.


2.8. Integrals of characteristic functions 141

PROOF. Setting b = 1 in the theorem, and

1
w j - i [0,
i"" f | >s ;1,
\t\

we obtain
, N 1 sin 2 0e/2) 1 /95\2
C
- 2 ^ 9 6 )
and, therefore, (2.8.14) implies (2.8.17).

COROLLARY 2.8.2. Let X be a random variable with characteristic function f(t).


Then

for any > 0 and a > 0.

COROLLARY 2.8.3. Let X be a random, variable with characteristic function f(t).


Then

<2 818)
) W i ; ) / > i *

for any > 0 and a > 0.

PROOF. Setting b = 2 in the theorem, and

(0, \t\>i,
8
h(t) = ^ 2(1 - | , 1/2 < |*| < 1,
( l 6 2 + 6|f| 3 , |i| < 1/2,

we obtain
, 3 sin 4 (x/4)
p(x) =
8 (x/4)4
(Gnedenko & Kolmogorov, 1954), and

> 9
~ 8 96/ '

and therefore, (2.8.14) implies (2.8.18).

Some extensions of Theorem 2.8.4 are contained in (Salikhov, 1996). There


are multi-dimensional analogs of the estimate given by Theorem 2.8.4 (see,
e.g. (Esseen, 1968; Larin, 1993)).
142 2. Inequalities

2.9. Inequalities involving differences of


characteristic functions
In this section, we study two problems involving differences of characteris-
tic functions. The main objective of the section is to present some recent
results concerning the problem of estimating the differences between distribu-
tion functions via the differences of the corresponding characteristic functions.
In addition, some inequalities will be given concerning the estimation of close-
ness of characteristic functions by the closeness of their powers.
Throughout the section Kit) = K\{t) + ^ ) denotes a complex-valued func-
tion satisfying the conditions
(i) the function K\(t) is even and fait) is odd;
(ii) K\{t) is integrable in the interval (1,1);
(iii) there exists a constant such that the function Kzit) /t is integrable
in (1,1);

(iv) for any x,

1
Jo 2'
where E(x) is the distribution function degenerated at zero.
In other words, Kit) is supposed to satisfy the conditions of Theorem 2.8.1.
For these functions, smoothing inequalities (2.8.1) and (2.8.2) are true for
every distribution function Fix) with characteristic function f(t). On the basis
of these inequalities, we obtain a series of estimates of the uniform distance
between distribution functions in terms of the differences of their characteristic
functions. We start with some general estimates.
Let F(x) and G(x) be arbitrary distribution functions with characteristic
functions fit) and g(t). From Theorem 2.8.1 we obtain

Fix) - Gix) < Fix + 0) - Gix - 0)

(2.9.1)

Fix) - Gix) > Fix - 0) - Gix + 0)

(2.9.2)
2.9. Differences of characteristic functions 143

for any > 0, which immediately yield the following assertion.

THEOREM 2.9.1. Let the function Kit) = K\{t) + iK<z(t) satisfy conditions (iHiv).
Then for any distribution functions Fix) and Gix) with characteristic functions
fit) and git), the inequality

sup\Fix)-Gix)\<^I^K2^ \fit) -git)\dt

\fit)+git)\dt (2.9.3)

is true for any > 0.

Rewriting (2.9.1) and (2.9.2) as

*Xx) - Gix) < v.p. f T T e ~ i x t ^ K dt

and

Fix) Gix) > v.p. JT


Te~
lXt\K(j,W)-g(t)\dt

- 2 f \ - ^ K l ^ ) g i t ) d t ,

we obtain the following estimate which is sometimes more preferable than


(2.9.3) (it is better, for instance, when |/(i)| has a 'heavy tail'slowly tends to
zero as |i| - oo or does not tend at all,while |g(i)| tends to zero as |f| > oo
sufficiently fast).

THEOREM 2.9.2. Let the function Kit) = K\it) + iK^it) satisfy conditions (iHiv).
Then for any distribution functions Fix) and Gix) with characteristic functions
fit) and git), the inequality

sup\Fix)-G(x)\< ( I ) \fit)-git)\dt

fiMf) \git)\dt (2.9.4)

is true for any > 0.


144 2. Inequalities

Useful estimates are obtained if inequalities (2.8.1) and (2.8.2) of The-


orem 2.8.1 are combined with the inversion formula in the form of Theo-
rem 1.2.5. Let Fix) and G(x) be distribution functions with characteristic
functions fit) and g(t). Denote

/lOx:) = v.p. J

hix) = v.p. J
i f ,ixt
J(x) = v.p. I git)
2 J-oo t
where, in the last integral, the principal value is taken both at the origin and
at the infinity, that is,
oo rh rH
/
-oo = H-*oo
lim lim
->0 /J- + /Jh
Then we obtain from Theorems 1.2.5 and 2.8.1

-I2ix) - J(x) < Fix) - G{x) < hix) - J{x),

which yields

I Fix) - G(*)| < max{|/i(x) - 7(x)|, |/2(ac) + J(x)|}.

But

hix)-Jix)

T
- i x (t ^iK ^
( h2 i r(J^ l \) fit) _ _ L v.p. f ~^ )
dt
v.p. ^ - git) dt e g { t

2 7d J 2 7|(|> t

and

I2ix)+ Jix) = f * e'^^Ki ) fit)dt


rT
- ** /_/""' ( f * () *> - <ksU))dt +
L r ' - ^ f
therefore,

\m\dt

git)
dt + - i dt. (2.9.5)
2 J\t\>T
2.9. Differences of characteristic functions 145

This relation will be used below together with inequalities of Theorems 2.9.1
and 2.9.2.
A number of useful inequalities that supplement (and often improve) Theo-
rem 1.4.9 can be derived from Theorems 2.9.1 and 2.9.2 (as well as from relation
(2.9.5)) by substituting K(t) = ^Ht), where #< 0) (i) is defined by (2.8.5). Before
formulating some of them, we prove some simple relations for the function
i^ 0 ) (i) (in addition to Lemma 2.8.1 which will also be used). As in the previous
section, and K^\t) denote, respectively, the real and imaginary parts of

LEMMA 2.9.1. For all real t,

PROOF. Without loss of generality, we assume that 0 < t < 1. Using inequality
(2.8.7), we obtain
cos(7rt)| 1
l a - ! + 2

()
1 () | cos(;rt)| 1
+
2 nt sin(Tti) 2

2 V w)

LEMMA 2.9.2. For all real t,

, ^ . (2.9.6)

PROOF. Since |L(0)(i)| is an even function and 2^0)() = 0 for |i| > 1, without
loss of generality, we may assume that 0 < t < 1. Using inequality (2.8.7), we
obtain

| ) 2 = [ t f f w ] 2 + u4 0) (i)] 2
_ 1
(1 - t)2 + (1 - t)2 cot2() + - ( 1 - )| cot(7rt)| + -
~ 4
1 2() sin2(7tf) , 2sin(Ttf). , 1
< - cot () + cot(Trt) +
4 ()2 (nt)2

- + - + 11
1 2

" 42 t2 t n2t2'

which implies (2.9.6).


146 2. Inequalities

THEOREM 2.9.3. Let Fix) and Gix) be two distribution functions with charac-
teristic functions f(t) and git). Then for any 0 < < 1 and any positive T, the
inequality

1 +p fT Im-git)
sup IFix) - G(x)\ dt
2 J-T\ t

^ ( i + j j \ m \ + m)dt (2.9.7)

is true.

PROOF. Setting Kit) = in Theorem 2.9.1, using Lemma 2.9.1 and taking
into account the obvious bound < 1/2, we obtain

rT\m-git)
sup - G(x)\ < [ dt
x 2 J- t

I l/W - 8(t)\ dt + ^l_T l/W + (2.9.8)
2
which implies (2.9.7), because

and

\fit)-git)\dt< i\fit)\ + \git)\)dt.


J-T J-T

Two extreme cases ip = 0 and = 1) are given below separately.

COROLLARY 2.9.1. Let Fix) and Gix) be two distribution functions with charac-
teristic functions fit) and git). Then for any positive the inequalities

fT\f(t)-git)
sup |F(x) - G(*)| < J - f dt
2, J- t
+

1+ ) \ f j m + (2'9'9)
d

sup II Fix)
BY - Gix)
, I <
^ -1 /
1 f(t)~
g(t)
x J-T t

T(\f(t)\ + \git)\)dt (2.9.10)

are true.
2.9. Differences of characteristic functions 147

Similarly (setting K(t) = from Theorem 2 . 9 . 2 , we obtain the follow-


ing bound.

THEOREM 2 . 9 . 4 . Let F{x) and G(x) be two distribution functions with charac-
teristic functions f(t) and g(t). Then for any positive T,

rT\m-g(t)
sup |F(*) - CKx)\ < - [ dt + \ f_TW)\dt. (2.9.11)
J-

The assertion of the theorem immediately follows from Theorem 2.9.2 and
Lemma 2.9.2.

REMARK 2.9.1. An estimate similar to (2.9.9) can also be obtained from Theo-
rem 2.8.3:

sup |F(x) - G(x) I = If T


e-^ifit)
dt
-git))- + Rf Rc

2 J-
T \m-git)
dt+^ jj\m\ + \g(t)\)dt.
I t

2 J-
However, here the constant 3/4 at the second summand on the right-hand side
is somewhat worse than that in (2.9.9): ^(1 + l/) 0.65915.

REMARK 2.9.2. We can improve the constant at the first summand on the right-
hand side of (2.9.11) at the cost of introducing an additional summand being
an integral of the difference |/(f) g(f)|. For instance, together with (2.9.11)
the following inequality is true: for any 0 < < 1,

+p fT I f(t)-g(t)
sup |F(*) - G(x) I < f dt
2 j-

+\{~+)\-|/w"{t)idt+\/:^dt- (292)

To prove this inequality, notice that for |f | < 1,

|)| < |<0)(i)| + \K$>\t)\

Now estimate the term (1 |i|) in the first and third summands by sin()/()
148 2. Inequalities

(inequality (2.8.7)) and that in the second summand by 1; then

) 1 - 1 sin(?rt) | cos(;rt)|
+ +
' ' ~ 2itt 2 2 ret sin(Tri) + 2
1+p 1/ 1
< + -11- + -
2 2 V

Substituting this inequality into (2.9.4) and estimating |i^ 0) (i)| by 1/2, we
obtain (2.9.12).

Now let us present one more bound. It is based on inequality (2.9.5).


Sometimes it is better than bounds (2.9.7), (2.9.11), and (2.9.12).

THEOREM 2.9.5. Let Fix) and G(x) be two distribution functions with charac-
teristic functions fit) and git). Then for any positive T,

m git)
sup IFix) - G(x) I < f I I dt
2 J- I t I
3 rT i/ git)
+ m i t dt. (2.9.13)
ir U * ^ L\t\>T

PROOF. Setting K{t) = K\t) in (2.9.5), we obtain

I Fix) - Gix) I < I git) - ^ ^ fit)\dt

[T \m\dt+ [ m dt.
2T J-T 2 J\t\>T
\t\> I t

Let us estimate the first integral on the right-hand side. We have


rT ... / t \ rT , .

L - ^ {) * s - L^
dt

dt = h+I2,

where
T
I fit) - git)
dt,
2 y_ I t
and, by virtue of Lemma 2.8.1,

* L wwi^i I1 -
s
(?) I **h Lmdt-
rT

So, finally we obtain (2.9.13).


2.9. Differences of characteristic functions 149

One often needs to estimate the closeness of characteristic functions by


the closeness of their powers. This problem arises, in particular, in various
problems related to stability. In the remainder of this section, we present
several estimates of this kind.
Of course, in the general case, the closeness of some powers of charac-
teristic functions does not imply the closeness of the characteristic functions
themselves. In fact, this closeness does not follow even from the identity of
powers (see Example 16 of Appendix A). Some essential constraints should be
imposed.

PROPOSITION 2 . 9 . 1 . Let fit) and git) be two characteristic functions, and let the
absolute value of git) do not increase for t > 0. Then, if

\fnit) - gnit)\ < < 1 (2.9.14)

for |*| <T,n> 2, then

I/(f) - (f)| < (Vs + V2)e lln (2.9.15)

for |*| < T.

PROOF. W e s e t

i 0 = min{i > 0: |^()| = 2}, T0 = min{f 0 , T},

and prove first (2.9.15) for |f| < T 0 . From (2.9.14), the definition of T 0 , and the
elementary inequality

n/wr-^n^irw-^wi

we obtain

|/)>, ^()>2, |*|<7V (2.9.16)

Thus, neither fn(t) nor is equal to zero for |f| < TQ and, hence, in view of
the continuity of /() and g(t), the representations

f i t ) = \fit)\nei(p(t\ gnit) = IsWIV"^, |*| < To,

are true, where (pit) and () are continuous functions such that

(pi 0) = v(0) = 0. (2.9.17)

It is easy to see that

|() - y/it)\ < /3 (2.9.18)


150 2. Inequalities

for || < To, because, if (2.9.18) does not hold, then, due to the continuity of
functions () and {) and because of validity of (2.9.17), there exists t\ > 0
(*i < T0) such that
- < |() - y ( i i ) | < -

But then

I t i h ) - g " ( f i ) | 2 = |/"(ii)| 2 + W ' i t i f - 2|/ ()||* (*)| (() - V (h))

> |/"(ii)| 2 + l t d i ) | 2 - | ( ) | ^ ( ) | > 3 2 ,

which contradicts (2.9.14). So, (2.9.18) holds. Furthermore,

\f{t) - = i r w i 2 + ^ 2 - 2|/"()^"()| cos(<p(i) - V (t))

= i i r w i - ^ 2 + 2 | / ^ ( ) | [ - cos(<p(f) -

= \\fn(t)\ - |g"Wl|2 + 4|/"()||g"()| sin 2 ^ ~J{t).

hence,

2
sin2<P(t)-w(t) j m - g ^ m K t (2919)

Let us use the elementary inequality



II
< ! I
II gl sinx I',

which is true for |x| < /6. Then from (2.9.18) and (2.9.19) we obtain

\(p(t)-(t)\<J-
3

that is,

(pit) Wit)
~ 3rt(|/(i)|"|^(i)| n ) iy2 ' (2 9 20)

Furthermore,

/ - / 1 + / 2 ^ ) + - + W 1
)
= /() - < f/"(i) < ,

which together with (2.9.16) yields


|(,)| s s ( 2 9 2 1 )
" *'" + ' '
2.9. Differences of characteristic functions 151

From (2.9.16), (2.9.20), (2.9.21), and the elementary inequality | sinx| < \x\, we
obtain

\m-g(t)\2 = wmyw* - \g(t)\*,\2

= /2+m 2
- 2\mw)\ cos ()
-{)

= l l / ( i ) | - k W I I 2 + 4|/(0|L?(i)|sin 2
2
II II / I
(pit) Iff I / I ^
)

</-*2 + ( 2
V3 J /"-1^)!"-1
2
( ^ ( 1 1/
" { )

\ 3 /

Thus,

( \
\f(t)-g(t)\<\l+-). (2.9.22)

If To = (that is ^ ) then the proposition is proved because of the inequality


1 + /3
17,
< V3 + V2, > 2.

If 0 < || < T, then

|()| < 2, jf(t)jn < + < 3,

which yields
\git)\ < (2)1"1, |/()| < (3) 1/ ;
hence,
/ - s ( i ) | < / + ()| < (V3 + \2) .
Combining the last inequality for To < || < and inequality (2.9.22) for
|f I < To, we finally obtain

\fit) - git)I < (y/3 + V2)eyn, \t\ < T.


152 2. Inequalities

Let us consider the distance kif,g) in the space of characteristic functions


(its definition was given in Section 1.4; for basic properties, see (Zolotarev,
1970; Zolotarev, 1971; Zolotarev & Senatov, 1975)). Making use of Proposi-
tion 2.9.1, we obtain the following estimate.

Let fit) and g(t) be two characteristic functions, and let the
THEOREM 2 . 9 . 6 .
absolute value ofg(t) do not increase for t > 0. Then

PROOF. W e s e t
= A(/n,^).

Then there exists To > 0 such that

|/()-/()|<2

for |f I < To and 1/To < . Using Proposition 2.9.1, we obtain

I fit) g(i)| < (^3 + V2)2VnEVn < (V3 + \/2) / 2 1 /

for |f I <TQ. Besides, since < 1, the inequalities

To V2

are true; thus,

Xif,g) = min max j ^ max |/(f) - *(f)|, |

v/2

If one of characteristic functions is separated from zero, then a more accu-


rate bound holds.

PROPOSITION 2.9.2. Let f(t) and git) be two characteristic functions such that

\fit)\2 > > 0


2.9. Differences of characteristic functions 153

and

\f2(t) - g*(t)\ (2.9.23)



Then

where
(\/2 + 1) + 6>/~
c(A) =
3V2MV2 + 1)
is a constant depending only on .

PROOF. We have
/()2-?2</2()-^()|<|
which yields
2 2
k w i ^ \ m \ - * ;

hence

] f 2 ( t ) ( 2 9 2 4 )
- * > * ^ S f * v S ^

Let us demonstrate t h a t

I arg f i t ) - argg 2 (i)| < J . (2.9.25)



If (2.9.25) does not hold, then there exists t\ > 0 such t h a t /3 < | a r g / 2 ( i i )
argg 2 (ii)| < /2. But then

\f2(h) -g2(h)\2 = \f(h)\4 + |#(ii)| 4 - 2 | / ( i i ) | 2 ^ ( i i ) | 2 c o s ( a r g / 2 ( i 1 ) - a r g ^ 2 ^ ) )


^ \f(h)\4 + k(ii)|4 - \f(ti)\2\g(h)\2
= (\m\2-\g(ti)\2)2 + \f(h)\2\g(h)\2

which contradicts (2.9.23). Thus, (2.9.25) is true.


Furthermore,

| ) - * 2 ( ) | 2 = | / | 4 + |*)|4 - 2|/(f)| 2 |g(*)| 2 cos(arg/ 2 (i) - a r g / ( i ) )


= ( | / | 2 - 1^| 2 ) 2 + 2|/(*)| 2 |s(f)| 2 (l - cos(arg/ 2 (i) - a r g ^ ( i ) ) )

>-m)\2m2sin2aTefHt)-2aTg8Ht)
154 2. Inequalities

which yields

. arg f2(t) axgg2(t)


sin I f i t ) - g \ t ) |. (2.9.26)
/2

Let us use the elementary inequality

IjcI < I sinjcL

II gl .

which is true for \x\ < /6. Then from (2.9.25) and (2.9.26) we obtain
I arg f(t) - argg(i)| =
arg fit) - axgg2{t)
(2.9.27)

And, finally, from (2.9.24), (2.9.27), and Lemma 2.1.14 it follows that

~n(V2 + 1) + 6V"
|/(f) ~g(t)\ < \f\t) -g'{t)\.
3y/2Uy/2 + 1)


It can be easily seen that both Propositions 2.9.1 and 2.9.2 are true not
only for characteristic functions but for any continuous functions f\{t) and
/2(f) satisfying the conditions |/i()| < 1 and I/2WI ^ 1.

2.10. Estimates for the first positive zero of a


characteristic function
Lower bounds for the first (the smallest positive) zero of a characteristic func-
tion or/and its real part are of great importance because in many statistical
procedures based on the empirical characteristic function, the 'working inter-
val' is a neighborhood of the origin which does not contain these zeros (see
the next chapter). The methods of statistical estimation of the first zero will
be considered in Chapter 3. Here we present several inequalities giving some
preliminary information about its position. The main source of such inequali-
ties are lower bounds for characteristic functions and various combinations of
these bounds.
Let F{x) be a distribution function with characteristic function fit). De-
note the first positive zero of SR/(i) and that of fit) by ro and RQ, respectively.
Obviously, RQ > tq.
2.10. First positive zero 155

THEOREM 2.10.1. Let Fix) have finite variance 2 . Then

>
oSx-.
2
The theorem is a consequence of the following more general result.

LEMMA 2.10.1. Let fit) be the characteristic function corresponding to a dis-


tribution with finite variance 2. Suppose that 0 < a < 1, and let ta be the
minimal positive number such that |/()| = :

ta = min{i > 0: |/()| = a } (2.10.1)

(if |/(f)| > a for all t > 0, then we set ta = oo). Then

arccos
ta>

PROOF. Without loss of generality, we assume that ta < oo. For any natural n,
we set

= arccos / ( - (2.10.2)

By virtue of Theorem 1.5.3, we have

ta + 0
- 1
- 2n> l^2J. - > oo;

therefore

ta
/ = 1
2 2
+0 > oo. (2.10.3)

Since cos = |/( /)|, making use of Corollary 1.4.2, we obtain

a = |/(f 0 )| > cos(rc<p),

which yields

> arccos . (2.10.4)

Furthermore,

cos n = l - ^ + o i Z ) = l - ^ - + o ( J ^ J , oo. (2.10.5)


156 2. Inequalities

From (2.10.2M2.10.5) we obtain

1 - +o - = =
( 1

2n2
(arccosa) 2 / 1
<1 ;- +
2 2
which yields
(arccosa) 2
*

THEOREM 2.10.2. Let F{x) have finite first absolute moment . Then

The bound given by Theorem 2.10.2 immediately follows from Theo-


rem 2.3.5.

THEOREM 2.10.3. Let F(x) have finite variance cr2. Then

> 1
+
l

Moreover, if Fix) has zero expectation, then


1 ft
ft 2 5 '

PROOF. Let us prove the first assertion of the theorem. Consider the function

v(0 = |/()| - (1 - ftf), t > 0.


By virtue of Theorem 2.3.5, this function does not decrease, therefore for any
to < Ro, we have v(io) ^ vi.Ro), i-e,

|/(i0)| - (1 - ito) < -(1 - iRo),


or
Ro > ^(|/(*o)| + jSlio).
Pi
Set = I/o (it is easy to see that /2 < RQ because, by virtue of Theo-
2

rem 2.10.1, R0 > /(2) and, by the Lyapunov inequality, / < 1). Then,
using Theorem 2.3.2, we obtain

' v i - >) ft 22'


2.10. First positive zero 157

The second assertion of the theorem can be obtained in exactly the same
way.

In the case of RQ, a more accurate estimate can be obtained than that given
by Theorem 2 . 1 0 . 3 if we use Lemma 2 . 1 0 . 1 instead of inequality ( 2 . 3 . 5 ) .

THEOREM 2 . 1 0 . 4 . Let F(x) have finite variance 2. Then

+ - arccos J 1 - (2.10.6)
V V
PROOF. Let 0 < < 1, and ta be defined by (2.10.1). Consider the function

v(f) = I/(f)I - (1 - fat), t > 0.

By virtue of Theorem 2.3.5, this function does not decrease, therefore

- 1 + fata = v{ta) < viR0) < - ( 1 - ft.RQ),

i.e.,
a
Rq + fa
Pi

or, by virtue of Lemma 2.10.1,

a 1
RN > + arccos a. (2.10.7)

Setting

. . , 1 - I2
(this value maximizes the right-hand side of (2.10.7)), we obtain (2.10.6).

Perhaps, more accurate lower bounds for the first positive zero of a char-
acteristic function or its real part can be obtained if we use moments of higher
(than two) order. For example, there are some grounds to expect that

'6

where c is some positive constant.


158 2. Inequalities

2.11. Notes
Theorems 2.2.1-2.2.4 are due to (Ushakov, 1997). Theorem 2.2.4 to some extent
improves inequalities from (Doob, 1953). Theorem 2.3.1 is due to (Sapogov,
1979). Theorem 2.3.6 was obtained by (Prawitz, 1973; Prawitz, 1975) in the
case = 1. In (Ushakov & Ushakov, 1999) it was shown that the same approach
is valid in the general case 0 < <5 < 1. Theorem 2.3.8 is due to (Prawitz, 1975)).
Some other inequalities of this kind are contained in (Prawitz, 1973; Prawitz,
1975; Prawitz, 1991).
Section 2.4 is mainly based on (Ushakov, 1981a). Theorems 2.4.1, 2.4.2,
and their corollaries are taken from this work. Theorem 2.4.3 was originally
obtained in (Prokhorov, 1962) for symmetric distributions, and in the general
case, in (Ushakov, 1981a).
Theorem 2.5.1 and its corollaries, as well as Theorem 2.5.2, were obtained
by (Ushakov, 1997). Theorems 2.5.3 and 2.5.4 are due to (Ushakov & Ushakov,
1999a). Results of Section 2.6 are simple consequences of results of Sections 2.4
and 2.5.
Theorem 2.7.3 was obtained in (Csrg, 1981b); Theorems 2.7.4 and 2.7.5
are due to (Ushakov & Ushakov, 1999b). Theorems 2.7.7-2.7.12 immediately
follow from their univariate analogs. Theorem 2.7.17 is taken from (Ushakov,
1981b).
Theorems 2.8.1 and 2.8.2 are due to (Prawitz, 1972). Theorem 2.8.3 fol-
lows from the results of (Prawitz, 1972); see also (Bentkus & Gtze, 1996).
Theorem 2.8.4 and its corollaries were obtained in (Daugavet & Petrov, 1987).
Some extensions and generalizations of these results are contained in (Sa-
likhov, 1996). The main results of Section 2.9 (Theorems 2.9.1-2.9.5 follow
from (Prawitz, 1972); see also (Bentkus & Gtze, 1996). Theorem 2.9.6 is due
to (Ushakov & Ushakova, 1995).
Theorem 2.10.1 was obtained in (Sakovich, 1965); other results of Sec-
tion 2.10 have not been published before.
3

Empirical characteristic functions

Empirical characteristic functions were originally introduced as a tool to solve


statistical problems related to stable laws, more exactly, in the parameter es-
timation of a stable law, i.e., in the situation where a description in terms
of characteristic functions is much simpler than that in terms of distribution
functions. Later, however, it turned out that there was a wide range of statisti-
cal problems which empirical characteristic functions seemed to be applicable
to and where the empirical characteristic function approach was a competitive
alternative for methods based on the use of empirical distribution functions.
The reason is that characteristic functions allow an easy characterization of
important properties of probability distributions, such as symmetry, indepen-
dence (of marginals), unimodality, etc. It is therefore not difficult to suggest
procedures based on empirical characteristic functions for areas of inference
such as testing for goodness-of-fit, independence, and so on.
In this chapter, we study empirical characteristic functions and present
some of their applications. We begin with the properties of empirical charac-
teristic functions as statistical estimators of the corresponding characteristic
functions. These results are the theoretical foundation for the statistical infer-
ence based on empirical characteristic functions which is given in the remain-
der of the chapter and includes parametric and non-parametric estimations,
testing for independence, symmetry and normality, and goodness-of-fit testing.
When we deal with multi-dimensional random vectors, we use the same
notation as in Sections 1.8 and 2.7. In particular, row-vectors are denoted by
bold lowercase letters, bold capital letters denote matrices as well as random
vectors. The prime denotes transpose.

3.1. Definition and basic properties


Consider a random sample from a distributionF, i.e., letXi, ...,Xn be indepen-
dent and identically distributed random variables with common distribution
function F(x). Denote the characteristic function of F(x) by fit). The empirical

159
160 3. Empirical characteristic functions

distribution function Fn(x) associated with the sample Xi, ...,Xn is defined as
n
F {x)
1
n = ~
.
i=l
where
l, Xi^x,
10,
0, Xi>x.

In other words, the empirical distribution function is defined as

0, x<Xa),
Fn(x) = r/n, X(r) <x<X{r+i),
1, X>X(n),

whereX( r ) is the r t h order-statistic associated with the sample X\, ...,Xn.


The expectation and variance of the empirical distribution function are

EF n (x) = F(x)

(so, the empirical distribution function is an unbiased estimator of the corre-


sponding distribution function) and

VarF n (x) = - F ( x ) [ l - F(x)].


The empirical distribution function is a (strongly) consistent estimator of the


corresponding distribution function, t h a t is

( Um sup |F n (x) - F(x)| = 0 ) = 1

(the Glivenko-Cantelli theorem). Numerical characteristics and parameters of


the empirical distribution function (moments, quantiles and so on) are called
empirical (or sample) characteristics and parameters. Each realization of
an empirical distribution function is an ordinary purely discrete distribution
function having at most jumps.
The characteristic function of the empirical distribution function is called
the empirical characteristic function associated with the s a m p l e X \ , ...,Xn and
is denoted fn(t). In other words, the empirical characteristic function is defined
as
OO 1 71
/ -OO eltxdFn{x) = - Y e^ l t x i .

The empirical characteristic function is a random function whose all realiza-


tions are characteristic functions of discrete (concentrated in at most points)
3.1. Definition and basic properties 161

distributions. This implies that each realization of any empirical characteristic


function satisfies the conditions (relations below concern any single realiza-
tion of an empirical characteristic function or can be regarded as concerning
an empirical characteristic function itself with probability one):

(1) /(0) = 1;

(2) |/(f)| < 1;

(3) fa(-t) = fjf),

(4) lim sup |/(f)| = 1;


|t|->oo

(5) fn(t) has derivatives of all orders and, moreover, is analytic.

It is easy to see that the empirical characteristic function is an unbiased


estimator of the corresponding characteristic function, i.e., Efn(t) = fit). Let
us calculate some other moments. We have

(/() - / ( ) / ( 2 ) - mv
= E/ n (fi)/(f 2 ) - f(ti)f(t2)

= E(lteltlXjltelt2Xk)
V >i n *=i J

= h Ee iilX >e i< - f(h)f(t2)


n j=1 k=l

= Eeiitl+t2)Xj + \ Ee^Ee^ - f(tl)f(t2)


n U n

= -\f(t1 + t2)-f(.t1)f(t2)]. (3.1.1)


Similarly,

cov(/ n (ii),/(i 2 )) = E(/(ii) - /(fi)X/(i 2 ) - f(t2))

= [/Cii - t2) - f(ti)f(t2)l (3.1.2)


In particular,

E|fn(t) - f(t)\2 = - [1 - |/(f)|2]. (3.1.3)



162 3. Empirical characteristic functions

Denote the real and the imaginary parts of fit) by uit), v(t) and those of fn(t)
by un(t) and vn{t) respectively. Then

1 n
Unit) = - ) COSitXj),
j=
n
nit) = -1 V
sin(fcX,).

Obviously,
Un{t) = uit), Ew(f) = vit).
Find cross-covariances of unit) and vnit). Observe t h a t

c o s i t X ) cosisX) = - [E cosiit + s)X) + cos((i - s)Z)].


2
Using this relation, we derive

cov(u(ii),w n (i 2 )) = E(u n (*i) - uiti))iunit2) - uit2))


= uniti)unit2) - uiti)uit2)
n
1 E C0S
= ~2
n (tiXj) cosit-^Xj)

+ 4z V E cos(iiX)E cos it-iXk) - uiti)uit2)


n r-f
j*k

= w-luiti +12) + uit 1 - t2) - 2uih)uit 2 )], (3.1.4)


2
and similarly,

cov(i>re(ii),i>n(i2)) = ^-[uiti - t2) - uit +12) - 2()( 2 )1,


Zn
(3.1.5)

coviuniti),vnit2)) = ^-[viti +12) - viti - t2) - 2uit\)vit2)].


2n
(3.1.6)

Relation (3.1.3) implies t h a t at every fixed point t, fnit) converges in mean


square to fit) as > oo:

lim E|/(f) - /(f)I 2 = 0.


71- OO

In addition, by virtue of the strong law of large numbers, fnit) converges almost
surely to fit) at any fixed point.
3.1. Definition and basic properties 163

In the multi-dimensional case, the definition of the empirical characteristic


function and its basic properties are similar to those in the univariate case.
Let Xi, ...,X be independent m-dimensional random vectors with common
distribution function F(x), e R m . Denote the characteristic function of F(x)
by f(t), and the empirical distribution function associated with the sample
Xi,..., X by F(x). The empirical characteristic function associated with the
sample , ...,X is defined as

As in the univariate case, the multi-dimensional empirical characteristic


function is an unbiased estimator of the corresponding characteristic function.
At each fixed point it converges to the latter both almost surely and in mean
square. Relations (3.1.1M3.1.6) are also true in the multi-dimensional case
and can be obtained for the multivariate empirical characteristic function in
exactly the same way as in the one-dimensional case.
Together with the empirical characteristic function, some its modifications
can be used for estimating a characteristic function as well as for solving
other problems if some a priori information is available about the underlying
distribution. For instance, if it is known that the underlying distribution is
symmetric about the origin, then it is reasonable to use the real part of the
empirical characteristic function instead of itself. This is motivated by the
following evident assertion: if fit) andg(i) are two characteristic functions and
f(t) is symmetric, then

\fit)-Rgit)\ < |/(f)-s(i)|.

In other words, the real part of fn(t) is always closer to fit) than fnit) itself if
fit) is symmetric.
If a sample under consideration consists of multi-dimensional random vec-
tors, and it is known that their components are independent, then it is reason-
able to use the product of one-dimensional marginal empirical characteristic
functions
m
Y[fn(o,...,o,tj,o,...,o)

instead of the multivariate empirical characteristic function /(t).


If the underlying distribution is absolutely continuous, then one can use
the following modification of the empirical characteristic function:

(3.1.7)
164 3. Empirical characteristic functions

where Tn oo, log Tnln 0 as oo, and if it is known that the underlying
distribution is either absolutely continuous or discrete, then the function

fi
fn t ) = ifn(t)' sup*(Fn(x + 0)~ Fnix ~ 0 ) ) > 1/n ' f 3 ft)
\f*n(t), supx(Fn(x + 0) - Fn(x - 0)) = 1 In,

can be used as an estimator of the underlying characteristic function, where,


as usual, Fn(x) denotes the empirical distribution function corresponding to
fn(t). It will be shown in the next section that estimators (3.1.7) and (3.1.8)
have better asymptotic properties than the empirical characteristic function.
At the same time, these estimators have an essential disadvantage: their
realizations are not characteristic functions.
Since the properties of multi-dimensional empirical characteristic func-
tions are often the same as in the univariate case, in what follows, we shall
usually not separate the univariate case and will formulate and prove results
in the general form (in the multi-dimensional case). Sometimes, however, for
the sake of simplicity, we restrict our considerations to the one-dimensional
case, usually, when the multivariate notations and calculations are cumber-
some.
Throughout the chapter, / denotes the empirical characteristic function
corresponding to a characteristic function / , where is the sample size.

3.2. Asymptotic properties of empirical


characteristic functions
Throughout this section, X i , . . . , X is a random sample (independent and iden-
tically distributed random vectors) from a distribution function F(x), e R m , X
denotes a random vector having the same distribution as Xy. In the univariate
case we w r i t e X \ , . . . , X n .
In this section, we study the limiting behavior of the empirical character-
istic function when the size of a sample tends to infinity. We start with the
consistency of the empirical characteristic function as an estimator of the un-
derlying characteristic function. In the preceding section, it was pointed out
that at every fixed point t e R m it is consistent, more exactly, /(t) converges
to / ( t ) both almost surely and in mean square. The Glivenko-Cantelli theo-
rem implies that fn(t) is almost surely consistent uniformly on each bounded
subset of R m :

lim sup |/(t) - /(t)| = 0 (3.2.1)


lltpr

almost surely for any fixed positive < oo. On the other hand, in the general
case, the empirical characteristic function is not consistent (even in the sense
3.2. Asymptotic properties 165

of the convergence in probability) uniformly in the whole space. Indeed, let


m = 1. Then for any n,
lim sup/(i) = 1
|i|-xx>
with probability 1 because each realization of fnit) is the characteristic function
of a discrete distribution, therefore, if fit) is the characteristic function of an
absolutely continuous distribution, that is,

lim |/(f)| = 0;
|f|->oo

then
lim sup |/re(i) /(f) | = 1
|i|->oo
with probability one. In fact,

P(lim sup I f n ( t ) - f(t)I > 0) = 1


|i|->oo

any time when the underlying distribution contains a continuous (maybe sin-
gular) component (in the sense of mixture). It turns out that ( 3 . 2 . 1 ) holds when
= Tn depends on and tends to infinity as > oo but not too fast. The
sharp result is the following theorem.

THEOREM 3 . 2 . 1 . If
l Q
lim g= 0,
n-> oo
then
lim sup |/(t) - /(t)| = 0
||t||<r

almost surely for any characteristic function fit).

PROOF. Denote the distribution function corresponding to the characteristic


function fit) by F(x) and the empirical distribution function by F n (x). Let
> 0 be arbitrarily small, < 2, and choose = Kie,F) so large that

f dF(x) < J.
8
J\\*\\>x
We set

bi t)= [ el{t'x)dFix),
J IMIsz

6(t) = / e^dFni*) = - f y W / f l p y a r } .
166 3. Empirical characteristic functions

Writing dn(t) = bn(t) 6(t), we obtain

sup |/(t) - / ( t ) | < sup |d(t)|+ sup |6(t) - / ( t ) | + sup | 6 ( t ) - / ( t ) | .


l|t||<Tn ||t||<r ||t||<r ||t||<r

The second term is

1 1 "
- sup - n~ 7{,>*}'
n
lltpT* >1 j=
and these bounds converge almost surely to ij|x||>^ dFi'x.) which is also a bound
for the third term.
m 3l2 m
Let us cover the cube [~Tn,Tn] by Nn = ([(8Km Tn)/e] + l) (here the
square brackets donote the integer part) disjoint small cubes ,..., A w h o s e
edges are of length /(4Km 3 ' 2 ), and let ti,..., be the centers of these cubes.
Then

sup |d(t)| < max |d(t*)| + max sup |d(t) - dn(tk)\


||t||<T l<k<Nn l<k<Nn teAk
< max |rf(tA)| + - ,
1<k<Na 4
for

K ( s ) - d(t)| < |&(s) - &n(t)| + |&(e) - &(t)|

<s - + f H<fr (s - t,x) TO

< 2mT||s - t||, s,ter.

Summing up, we see that



sup |/(t) - /(t)| < max K ( t * ) | + (3.2.2)
||t||<r l<k<Nn 2

almost surely for large enough, the threshold depending on . Now

= p
( s > I)

<Nn supm f|d n (t)| >


teR \ /

<MT supm Y^Rjit) 7


teR j= > S M =
j= 1
/
3.2. Asymptotic properties 167

with some constant = (, F, m), where the random variables

Rj(t) = (cos (t,Xj) )/{|iv.||<K} - / cos (t,x) dF(x), j = 1, ...,n,


\ / 11*11^

are independent, |.R/t)| < 2, ER/t) = 0, and

v2(t) = ER?(t)

= [ cos2 (t, x) dF(x) i f cos (t, x) c?F(x) ] < 1 .


J\\x\\<K \/IMiar /
The random functions Ij(t), j = 1 ,...,n, are defined with the cosine function
replaced by the sine, and hence these are also independent and identically
distributed with |//t)| < 2, E7,(t) = 0 and E7?(t) < 1. Therefore the Bernshtein
inequality (see, e.g. (Petrov, 1975)) yields

(l n e\ i 2 e X P l~S}' ^ 2w2(t),
> i H k l

Since u2(t) < 1 and < 2, the probability in question is no greater than
2 exp{2/64}, and the same is true for the other one with the I/s. Thus,

Let < 2/(64m). Then Tn < exp{5n} for sufficiently large n, hence <
oo, and the Borel-Cantelli lemma and relation (3.2.2) yield the desired result.

The result given by Theorem 3.2.1 is sharp, as demonstrated in the follow-


ing theorem.

THEOREM 3 . 2 . 2 . If
lim If(t!,...,t k ,...,t m )l = 0

for some k, 1 < k < m, and

lim sup l o g T " ^> 0,


>oo
then there exists a positive such that
lim sup (supm |/(t) - /(t)| > ) > 0.
>oo \ieK /
168 3. Empirical characteristic functions

To prove the theorem, we need the following lemma. Assume that m = 1


and denote

Sn(t) = eitXj = nfn(t).
>1

LEMMA 3.2.1. Let JV = {nfc}^ denote an arbitrary non-decreasing sequence


of natural numbers and

p a ( j V ) = sup inf lim sup { sup n ^ >m \ .


M>0 K> *-oo lff<i<a * n
k J

Then
Pai^) >0

for any a > 1.

PROOF. We s e t
i^V) = inf { a : > 0}.
What we have to show is that i^V) = 1. First we establish the following
properties of (^Y):

(i) {jV) < 6 for every jV\

(ii) if = {mjt}^! is another sequence of positive integers with

nk-mk = 0(1), k -> oo, (3.2.3)

then 0 M O = {Jt)\
(iii) if = { 2 t h e n (/3(2)2 = ( ) .

The proof of (i) is based on the Dirichlet theorem from the theory of Dio-
phantine approximations whish asserts that ify i,... ,yn are arbitrary real num-
bers, > 0 and a > 1, then there exists an integer t e [K, Kan] such that with
appropriate integers vi,..., vn the inequalities

\tyj-Vj\<^, j=l,...,n,

are satisfied simultaneously. Applying this with a = 5 and yj = Xj/(2n), j =


1, ...,n, we see that for arbitrary real numbers xi, ...,xn and > 0 there exists
an integer t,K <t <K 5n, such that
3.2. Asymptotic properties 169

Since for every fixed K, we have 5" < 6" for all sufficiently large n, it follows
that

This means that pe(^Y) = 1, and hence (i) is proved.


Now suppose (3.2.3) and let > MO. If we choose in between, <
< , then there exist an > 0 and a > 0 such that

But (3.2.3) implies that for all sufficiently large k, we have a m * > a"* and

\Smi(t)\ ^ |SBtW| -\nk-rnk\>l \Snt(t)\ Q

and so

This means that p a (^C) > 0. Since this is true for all a > (jV), we can
conclude that {J<() < {^V). Reversing the roles of JV and j, we obtain the
opposite inequality, and hence (ii) is also proved.
Let us prove (iii). Introduce the following subsequences of the original
sequence X:

X*1* = {X1.X4.-X7.-X10> },
X*2) = {X2.-X5.-X8.-X11.}.
*1* = {-X2.-X3.-X5.-X6.-X8.-X9.-X11.-X12. }.
Y = {-X1.-X3.-X4.-X6.-X7.-X9.-X10.-X12. }>
Y*3) = {X1.X2.-X4.-X5.-X7.-X8.-X10.-X11. }

Let < /3MO. For each k,

S2nk(t, Y<3)) = Snk(t;X) + SnfrX)


170 3. Empirical characteristic functions

whence

2 = lim lim lim sup ( sup 5^*1^ >


MiO ^ VK^^)2"* Zrik J

< lim lim lim sup p f s u p


/iO *-> jfe-. \K<t<anit 2nk 2

+ Pf sup W ' U ^ j
\K<t<an* %nk 2 y

( sup nk' nk
K<t<a
IS ('^)!
> )I
\

(
= 0 + 0 = 0,
sup nk'
K<t<a nk
IS (<2))|
> JI
\

where at the last step we used a < {^V). Thus, < (<sV) yields yja. < {2jV).
Therefore ( < ((2^Y))2.
Now let a < (2^). It is obvious that

S n j t ; * ) = i[S 2 ,(i;Y <2) ) + S 2 t (i;y ( 3 ) ) - S^Ci; Y")].

Hence, similarly to what we had above,

pa2(<sV) = lim lim lim sup ( sup > \f


MiO K->oo k^too \K<t<(a2)"k "A j

< lim lim lim sup


jlfio Kyookyo
P| ( sup
\K<t<a2"k 2nk
SsHH^^
3/

{VtfSSa2'* 22nnik

\s2nk{t-,Y^l))\
3

^
sup
\K<t<a2n* 2 nk
= 0 + 0 + 0 = 0,

i.e., a < /3(2 yields a 2 < (<vV). Therefore the opposite inequality
((2)2 < (^V) also follows, and hence we have (iii).
Thus, properties (i)-(iii) are validatedv Now, for a positive integer m, we set

Uk I
= {
2m I 2m
3.2. Asymptotic properties 171

Since for fixed m,


-nk = O(l), k -> oo,
2m
by property (ii) we obtain

2> =
2m

and, by the m-fold application of property (iii),

Thus, by property (i),


X2"
1
2 TOY

)) -'( 1
2m
1

V2m
V2m
= m-J/ <6
2
and since this is true for any integer m > 1, the equality (<yV) = 1 follows.

PROOF OF THEOREM 3.2.2. Since

sup |/ n (fi, . . . , i O T ) - / ( i i , . . . , f m ) |
|(ti tm)\<Tn
> sup |/ n (0, ...,0) - /(, ...,0)|,
-Tn<tk<Tn

where fn(0,...,,^,, ...,0) is the empirical characteristic function of the t h


components of ,...,X and /(0,...,,^,, ...,0) is the common characteristic
function of these components, it is easy enough to prove the theorem in the
univariate case. We assume therefore t h a t m = 1, i.e., t h a t X\, ...,Xn are
independent real random variables with common characteristic function f(t),
oo < t < oo, with lim^^oo |/(f)| = 0.
By assumption, there is a y > 0 such t h a t Tnk > eyn* for some subsequence
rik of the positive integers. On applying Lemma 3.2.1 with a = er > 1, we
obtain an > 0 and a > 0 such t h a t

P[ sup |/nt(i)|>M) >


\K<t<rni> J

for every > 0. Choosing so large t h a t \f(t)\ < Ml 2 for t > and then setting
= 12, we obtain

lim sup [ sup |/ 4 (i) - fit) \ > ) > lim sup ( sup \fnit(t)\ > ) > ,
kxx> \\t\<Tnk J A->oo \K<t<er"k J

which is the desired result.


172 3. Empirical characteristic functions

Although Theorem 3.2.1 cannot be improved in the general case, this can
be done under the additional condition that the underlying distribution is
discrete.

THEOREM 3.2.3. If fit) is the characteristic function of a purely discrete distri-


bution, then

PROOF. Let Xj take values ,2, with probabilities p\,pi, For an arbi-
trary > 0 there exists ko such that

oo
(3.2.4)
k=k0+i

Denote A = {x*0+i, xa0+2, } By virtue of the strong law of large numbers, for
this there exists no such that for > no,

(3.2.5)

and

almost surely. The latter implies, due to (3.2.4), that

(3.2.6)
3.2. Asymptotic properties 173

Using (3.2.4H3.2.6), we obtain

|/(t) - /(t)| =
ra
>1 k=l

i(t,Xk)
- ^
>1 A=1 k=l
Ao \ n
) - , ] e'<t.*A>
-7{>=^>
*=1 V" >1
1 oo
^
>1 A=A0+1 *=0+1
*o
- 7{,= + - 7 { ^ } + Pk
A=1 >1 J= 1 A=Jfeo+l

< - + + - = ,
2 3 6
which, since is arbitrary, proves the assertion of the theorem.
As mentioned above, the empirical characteristic function is not a consis-
tent (almost surely) estimator uniformly on the whole space if the underlying
distribution is absolutely continuous. However, in this case, the empirical char-
acteristic function can be modified in such a way that the resulting estimator
is uniformly consistent. For instance, the estimator

fn{ t), ||t||<rB,


'0,
m =
>Tn,

where Tn oo and log J as > oo, is consistent (almost surely) uniformly


on the whole space, if the underlying distribution is absolutely continuous.
Indeed,
sup |/*(t> - /(t)| < sup | / ; ( t ) - / ( t ) | + sup |/(t)|.
teR" ||t||<T ||t||>T
The first term on the right-hand side tends to zero almost surely as oo
by virtue of Theorem 3.2.1, the second term tends to zero because /(t) is the
characteristic function of an absolutely continuous distribution. The following
questions arise in relation to the estimator /*(t): does there exist an estimator
of the characteristic function which is consistent uniformly on the whole space
for

(a) all characteristic functions,


174 3. Empirical characteristic functions

(b) for characteristic functions of all singular distributions?


All realizations of the estimator /* (t) are not characteristic functions, more-
over, they are not continuous which is its defect. This poses one more question:
does there exist an estimator of the characteristic function whose realizations
are almost surely characteristic functions and which is consistent uniformly
on the whole space for characteristic functions of all absolutely continuous
distributions?
If convergence of the empirical characteristic function is considered on a
compact set or under some restrictions on the underlying distribution (discrete-
ness, finiteness of the first order moment), then some additional information
about the asymptotic behavior of the empirical characteristic function can be
obtained. First, we formulate a theorem concerning the weak convergence of
the empirical characteristic process which is defined as

y(t) = yMfnit) " /(t)).


Let be a compact set in Rm and denote by ^(B) the Banach space of
continuous complex valued functions on with the usual sup-norm

|| || = sup I |.
teB
Let Y(t) restricted to be a random element of () with EY(t) = 0, the
cross-covariance matrix (see (3.1.4M3.1.6))
_ (EUn{t)Un{s) EE/(t)Vn(s)\
l ' ' ~ \EVn(t)Un(s) EV(t)Vn(s)/
_ f h [u(t - s) + u( t + s)] - w(t)u(s) \ [-u(t - s) + u(t + s)] - u(t)u(s)\
" [u(t - s) + u(t + s)] - v(t)u(s) \[u(t - s) - u(t + s)] - v(t)v(s) J

and (see (3.1.2))


EY(t)Y(s) = /(t - s) - /(t)/(-s),
where Un(t) and V(t) are the real and imaginary parts of Y(t) respectively.
Let YHt) = U(t) + iV(t) be a complex-valued m-variate Gaussian random
field with EYf(t) = 0 and with the same cross-covariance matrix as Yn(t).
Thefinite-dimensionaldistributions of Yn(t) converge by the multidimensional
central limit theorem to those of YjKt) as - oo. However, Y(t) does not
always converge weakly in () to Yf(t) because the latter process can be
almost surely discontinuous. The theorem below gives necessary and sufficient
conditions for the weak convergence.
Denote
<p(t) = Vl - u(t)
and
Mv(y) = mes{t: ||t,| < 1/2, <p(t) < y}, 0 < y < 1,
3.2. Asymptotic properties 175

where mes stands for the Lebesgue measure. Then the non-decreasing rear-
rangement () of (pit) is defined as

() = sup{y: M v (y) < A}.

THEOREM 3.2.4. Yn(t) converges weakly to Y>(t) in () if and only if

()
/
Jo
( M )
1/2
dh < oo. (3.2.7)

A proof of the theorem is contained in (Marcus, 1981) for the univariate case
and in (Csrg, 1981c) for the general case (see also (Feuerverger & Mureika,
1977; Csrg, 1981a)). Note that the condition of the theorem is satisfied, in
particular, if
E(log+ ||||) 1+ < oo
for some > 0.
The results presented below in this section have been proved only in the
univariate case, so in the remainder of the section, we assume t h a t m = 1.
The following two theorems, which are due to (Keller, 1988), deal with
the problem of large deviations of the empirical characteristic function. More
precisely, they give asymptotic expressions for the limit

lim - log ( s u p |/() - /()| > ) , > 0,


n^oon VfeB /

where is a compact set in the general case and = R 1 if f(t) corresponds to


a purely discrete distribution.
First, let us prove an auxiliary lemma. Let > 0 and 0 < < 1. Define

+ 1

( ( + )log + (1 - - ) l o g ,0 < < 1 - ,


1 -
+,
J( 0, ) = +, J ( 1 , ) = log(l ). 1 < ,

LEMMA 3.2.2. For any 0 < < 1, the relation

lim sup - log ( sup |F n (*) - F(x)| > ) < - min{J05. ): 0 < < 1 - }
-* oo \<< /

is true.

PROOF. Let U\,U2,... be a sequence of independent and identically distributed


random variables having the uniform distribution on the interval [0,1]. Denote
176 3. Empirical characteristic functions

the empirical distribution function of the sample U\ Un by Gn{x). ThenXi


and F _ 1 ( C / i ) are identically distributed. W e hence obtain

( sup - 2?:)| > ) = ( sup |G(F0c)) - > )


Voo<x<oo J \oo<x<oo J

< P ( sup |Gn(x) - x \ > )


VftSi<l /

which proves the assertion of the lemma because

lim - log f sup |GnG*:) xl > ^ = min{J(p, ): 0 < < 1 } .


n-*
\<< J


N o w let us introduce the random vector

Yj(t) = ( c o s ( t X j ) - u(t), s i n ( t X - v(t)).

Denote by Mt(Q), e R 2 , its Laplace transform

M0) =

and denote

ht(e, 0) = inf e - n M t ( r e ) ,
r> 0

it(e) = l o g ( s u p { h t ( e , ): e R2, |||| = 1}).

THEOREM 3.2.5. Let be a compact set and

i(e) = supi f te).


teB

Then
lim - log ( s u p |/(f) - /(f)| > ) = ().
-^ocn Vie /

PROOF. If t e B, by (Sethuraman, 1964, Theorem 7) w e obtain

it(e) = lim - log (|/() - /(t)| > )


n->oo '

< l i m i n f - log ( s u p | fn(t) - fit) \ > ) .

Hence

He) < l i m i n f - log ( s u p |/() - /()| > ) .


n-> oo n V teB J
3.2. Asymptotic properties 177

Let k be an arbitrary positive integer. Cover the set by a finite number


of open intervals B\, ..., of length 2 I k each with centers kj e B,j = 1, ...,l.
Setting
Sn* = SUp \fn(t!) - f(h) - (fn(t2) - f(t2))\,
ti,t2eB, |<i-t2|<l^
we obtain
s u p \ f n ( t ) - /(i)| < m a x \fn(kj) - f{k| + SnJt
teB

for each n. Let e (0, ) be given. Then

lim
-sup
oo -Tl log (sup
\teB
|/() - /()| > / )

< lim sup log


n-> oo ( m a x !/(*,) - f(kj)I + Snik > , Sn_k < )

+ ( m a x \ f n ( k j ) - f(kj)I + SnJl > , Sn>k > )

< lim sup log


n00 ( m a x \ f n ( k j ) - f(kj)\ > - ) + P(Snfi > )

< lim sup log


n> 00 2 m a x I ( m a x \fn(kj) - f(kj)\ > - ) , ( 5 > ) }

= max (lim
sup - log (max )/(A:,)| > ), lim sup - log (^>)1.
L n->o Tl \1</</ / oo J

Let us estimate the first lim sup in this expression. We have

l i m s u p - l o g ( m a x \fn(kj) - f(k\ > -


n-> 00 \ 1</<'/ /
1 1
< l i m s u p - log (\fn(kj) - f(kj)I > - )
oo
>1
< maxiste ) < ( ),
</</ 7
The second lim sup requires some calculations concerning Let t\,t2 e B,
|ii t2\ < 1/k and > 0 be given, where is a continuity point of F(x) such
that
1 -F(A) +F(-A)<
16
Then we obtain

|/ ()-/()-(/(2)-/( 2 ))|< f (e** - eu*)d(Fn(x) - F(x))


J |

+ f (eitlX - elt*x) d(Fn(x) - F(x))


178 3. Empirical characteristic functions

Now

l t 2 X
[ _ e ) d ( F n ( x ) - F i x ) )

J\x\>k

< [ \eitlX - eit2X Id F n ( x ) + [ \eltlX - eit2X I d F ( x )


J\x\>X J\x\>X

< 2|F n (X) - F U ) | + 2|F() - F(-A)| + 4(1 - F ( X ) + F ( - X ) ) .

Let
= max Iii, ' = 2(1 + ) .
teB
Using integration by parts, we obtain

f (elilX - e l t 2 X
) d ( F n ( x ) - F ( x ) )
J

_ eit2Xj _ 0) _ _ (eitiC-) _ eit2(-X)j _

l t x l i 2 X
- i ( F n ( x ) - F ( x ) ) (he * - t 2 e ) d x
J
< 2 \ F n { X - 0 ) - F ( X ) \ + 2 \ F n { - X ) - F ( - X ) \ + \ t l - t 2 \ X ' sup | f ( x ) - F ( * ) |
oo<t<oo

< 2 \ F n ( X - 0 ) - F ( X ) \ + 2 \ F n ( - X ) - F ( - X ) \ + ^ -sup |F(x) - F ( x ) \ .


R oo<x<oo

Summing up, we obtain

S n , k < 2 \ F n { X ) - F ( A ) | + 2\Fn{X - 0) - F ( X ) | + A \ F n ( - X ) -

+ 4(1 - F ( X ) + F ( - X ) ) + \ sup |Fn(x) - F(x)\.


00<x<00

Hence,
lim sup - log ( s u p |/ () - /()| > )
n-> oo V teB J
is bounded by the maximum of ( ),

lim sup - log (\F n (X) - F(A)| > A ) ,


n>oo \ lb/
l i m s u p - l o g ( \ F n { X - 0) - F ( A ) | > ,
n-oo \ /
lim s u p - l o g ( \ F n ( - X ) - F ( - X ) \ > ^ ) ,
n-y oo \ lb/
3.2. Asymptotic properties 179

and therefore,

lim sup log f sup |FB(*) - > ( | - 4(1 - F(X) + FX-))) ).


n->oo \oo<x<oo \4 / /

If we let first k and then tend to infinity, we obtain

lim sup - log (sup |/(f) - /()| > ) < ( - ).


- oo Jl \teB J

This follows from the equality

lim sup - log P(|F(x) - F(x)\ > )


-

= - min{J(F(x), e),J( 1 - F(x), )}, - o o < < oo, > 0,

and Lemma 3.2.2.


Finally, e (0, ) was arbitrary. Hence i(-) is left-continuous by Theo-
rem 7 and Lemma 3 of (Sethuraman, 1964). This implies the assertion of the
theorem.
If the underlying distribution F is discrete, then the condition of compact-
ness of S may be omitted.

THEOREM 3.2.6. Let = R 1 . IfF is purely discrete and

() = supifte) > oo
teB

for each > 0, then

lim - log (sup |/() - /(f)| > ] = ().


V teB J

PROOF. With the same conclusion as in the proof of Theorem 3.2.5, we obtain

i(e) < liminf - log (sup |/ () - /()| > ) .


n^ oo \teB J

Let Xj take values a\,a<i,... with probabilitiesP\,P2, Let e (0, ), positive


integer ko and > 0 be given. If Xj takes a finite number of values, say N, we
suppose that ko < N. Denote
m
git) = Y^ | l - e i t o * | 2 .
A=1
180 3. Empirical characteristic functions

Since g{t) is almost periodic, there exists an L - L(y 2 ) > 0 such that every
interval of the real line of length no smaller than L contains at least one -
almost period, i.e., a number satisfying |g(t + ) < y2 for all real t.
Hence, if t is fixed, we can choose an -almost period from the open interval
(t, t + L). Then we obtain
|/(f) - /()| < If n (t + ) - fit + )I + |/(i) - f(t) - (fn(t + ) - fit + ))|
< sup |/n(<) - /()| + If n (t) - fit) - (fn{t + ) - fit + ))|.
0<t<L (3.2.8)
It follows from Theorem 3.2.5 that

lim sup - log ( sup |/() - /(f)| > - ) < ( - ).


oo \0<t<L /
Consider the second term of the right-hand side of (3.2.8). We have
\fn(t)-f(t)-(fn(t+T)-f(t + T))\
1 oo

^ - ^ ' - ) )
j= l k=i
* L*J
II oo
(+)*
hxj^k) -Pk
J=11 k=l
I. 1
n

1 n ko
- + 2 Ihxj^-Pk
k=l k=ka+X
1/2
1 n ko
+2 I7 {Xj=a}-Pk
j= 1 k=l k=ko+l
1/2
1 n ko
-p^ I + 2 -Pk
nM A=1 J k=k0+l

1 "
= < -E
* J>
ZJ
n%i

where
1/2
ko
= -Pk + 2 |7R=a*} -Pk 1 <j < n.

It follows from Theorem 3.1 of (Bahadur, 1960) that

lim sup-log - , ><5 | = inf log Ee rZl - r<5.


n>00 Tl U f ^ I r>0
3.2. Asymptotic properties 181

Since t is arbitrary, the preceding inequalities yield

lim sup - log ( sup |/() - /()| > J < max |( - ), log EerZl - r j
n-> oo \oo<t<oo J >
for all r > 0. If we let first converge to zero, then ko tend to infinity (to if
X takes a finite number of values), and finally let r tend to infinity, we obtain

lim sup - log ( sup |/(f) - /(f )| > ) < ( -).


n-> oo \oo<t<oc J
The concluding step in the proof of Theorem 3.2.5 yields the desired result.

In conclusion of the section, we present a theorem due to (Devroye, 1994)


giving an upper bound for ^sup^^ | fn(t) /(f)| > b^j in terms of and b (a
and b may depend on n).

THEOREM 3.2.7. Let the underlying distribution function F(x) have finite first
absolute moment . Then, for a > 0, 6 > 0, where a and b depend on in an
arbitrary fashion,

(sup |/(i) - m\ > 4 ( + ^ ) e - ^ 2 + Rn,

where
Rn = F*n (~) + 1 - F*n Q/ift - ) ,

and Rn 0 as > oo uniformly over all a and b.

PROOF. S e t
_ b_
r ~ 4/3i
We find numbers t\ < t2 < ... < tk with the property that t\ = a, tk = a,
|ij i;+i| < y. Obviously, we can assure this with k<l + 2/. We begin with

(sup\f n (t) - m\ > b ) < ( sup |/(i) - f(s) I > 6/3 ]


\||< J \|i-e|<y J

+ sup I/(f) - /(s)| > 6/3^ + P(lfn(fi) - fiti)I > 6/3).


\l*-l<r J i=1

Denote the three summands on the right-hand side by T\, and T3, and
estimate them. Note that

|/(f) - f(s)I < E|1 - ei(i~s)*| < E|(i - s)X\ < yfa < 6/3
182 3. Empirical characteristic functions

for ji s| < . Therefore, = 0. Further, we let be the random variable


t h a t puts mass 1 In at each of the Xf s. Then
\fn(t) /n(s)| < E|1 e ' ( i _ s ) y |
n
1
<E|(i-s)Y| = \t-s\
1=1

Therefore,
1 n 1 n

T2< P | y - * > 6/3 < > 4^/3 = Rn 0



i=l =1
by the law of large numbers. Finally, for fixed ij,

P(|/ B (ii) - fiti) > 6/3) < P(|(ii) - uiti)I > ft/6)
+ P(|u n (ii) - u(ii)| > 6/6) < 4e- n b 2 / 1 2 ,
by Hoeffding's inequality for bounded random variables (Hoeffding, 1963),
where, as usual, u(t) and v(t) are the real and imaginary parts of fit), and un(t)
and vn(t) are those of fn(t). This concludes the proof of the theorem.

3.3. The first positive zero


In this section, we consider only the univariate case.
Many statistical procedures based on the empirical characteristic func-
tions depend on a 'working interval' in the neighborhood of origin which is
the wider the better but m u s t not contain zeros of the real part of an empir-
ical characteristic function as well as those of the corresponding underlying
characteristic function because otherwise either a test statistic is not defined
or some uniqueness conditions break. Therefore, of interest is the problem of
estimating the first positive zero of the real part of a characteristic function
as well as t h e information about the first positive zero of the real p a r t of an
empirical characteristic function.
We present now some results concerning estimation of the first zero of
the real p a r t of a characteristic function by the first zero of the real part of
the corresponding empirical characteristic function and those concerning the
distribution of the latter.
Let X\, ...,Xn be a random sample from a distribution function F(x) with
characteristic function fit), and letF n Gc) and fn(t) be the empirical distribution
function and empirical characteristic function of X\, ...,Xn. As in the preced-
ing sections, denote the real and the imaginary parts of fit) by uit) and u(t),
respectively, and those of fn(t) by un(t) and vn(t). So,
n
1
Unit) = - * COS(tXj).
3.3. The first positive zero 183

We set
ro = min{i > 0: u(t) = 0}
(the first positive zero of u(t); if ro does not exist, we write ro = oo), and define
the random variable Rn as

Rn = min{i > 0: un(t) = 0}

(the first positive zero of un(t)).


It is clear that Rn may exist and be finite almost surely even when ro does
not exist (ro = oo). It turns out that the converse is also true: ro may exist
and be finite while Rn does not exist with positive probability which does not
depend on (see Example 39 of Appendix A). Moreover, even when both ro
and Rn do exist, are finite and isolated (Rn with probability one), Rn is not,
generally speaking, a consistent estimator of ro. In fact, there does not exist an
estimator of ro which is consistent for all characteristic functions f(t) satisfying
the conditions

(a) ro exists, is finite and isolated;

(b) R almost surely exists, is finite and isolated for all n\

(c) u(t) takes both positive and negative values.

This is a consequence of the following fact: there exists a sequence of charac-


teristic functions fit),g\it),g2(t),..., all satisfying conditions (aHc), and such
that
s u p I g n ( t ) / ( i ) | > 0 as oo,
t
while
ro as oo,

where ro, rq1', r , . . . are the first positive zeros of real parts of f(t),gi(t),g2(t),...
As an example of such a sequence, consider

f(t)=pe~Vi + ( 1 p ) cos t,

where 0 < < 1 is some appropriately chosen number, i.e., in such a way that
f'(t) = 0 for t = ro, and

grSt) = (p + + ( 1 - - ) cos t,

where > 0, > 0 as * oo, and + e < 1, = 1,2,...


However, Rn is a consistent estimator of ro under the additional condition
that u(t) decreases in some neighborhood of ro.
184 3. Empirical characteristic functions

THEOREM 3.3.1. Ifro < oo is an isolated zero ofu(t), and u(t) decreases in some
neighborhood of ro, then
a.s.
Rn > ro as > oo.

PROOF. Fix an arbitrary > 0. In view of the uniform convergence of un(t)


to u(t) on each bounded interval (Theorem 3.2.1), un(t) > 0 almost surely for
all 0 < t < ro and all sufficiently large n. On the other hand, due to
the same reason and since u(t) takes negative values at some points of the
interval (ro, ro + ), for all sufficiently large n, un(to) < 0 almost surely for some
to e (ro, ro + ), therefore Rn e [ro , ro + ] almost surely for all sufficiently
large n. Since is arbitrary, this means that Rn > ro as > oo.

Since many procedures based on the empirical characteristic function be-


come unreliable if points beyond the first zero are used, it is important to have
methods of calculation or estimation of Rn given X\ Xn.
Some information on the position of Rn can be derived from sample mo-
ments on the basis of results of Section 2.10. Since each realization of un(t) is
a symmetric characteristic function of a distribution whose variance coincides
with the corresponding realization of the random variable

we immediately obtain from Theorem 2.10.1 the following estimate for Rn:

Rn > a l m o s t surely.
2 v /m2
This bound asymptotically requires

Other estimates, involving the (sample) first order absolute moment or both
the first and the second moments, follow from Theorems 2.10.2-2.10.4:

Rn > almost surely,



where

almost surely,
m 2m2

almost surely.
3.3. The first positive zero 185

Now we present a simple explicit method of calculation ofi? which requires


only a fractional moment condition on F. Let s e [0,). Then, for any
t e (s,Rn),

1 n

Iu n {t) - Iin(s)| < - I cos(tXj) - cos(sXj)\


n j= 1
n
. (t-syxj
nk sin
2
- < 2 1 _ | s\ama, 0 < < 1,

where
1 "
ma = - Y" \Xj\a, 0 < a < 1.
71 > i

Thus, for f e (s,Rn),

un{t) > un(s) - 21_| - s\ama.

The right-hand side is an approximation of un(t). Set > = 0 and

The properties of { T n a r e summarized in the following theorem.

THEOREM 3.3.2. (i) For each fixed n,

Rn US k .> OO.

(ii) If tq < oo, u(t) decreases in some neighborhood ofro, and


OO

/ -OO \x\adF{x) < oo


for some 0 < a < 1, then for large enough

sup IT n k Rn I 0 as k oo.
n>N

PROOF, (i) Notice that is a monotone increasing sequence which is bound-


ed by Rn. Let bn be any number in (0,R n ). Then set

By the definition of Rn, An > 0 and Tnjk > bn for k > [6/] + 1, where fr]
denotes the integer part of x.
186 3. Empirical characteristic functions

(ii) Let > 0 be given. Choose > 0 such that

inf { > 0: u{t) - = 0} > r0 - /2,


inf {* > 0: u(t) + = 0} < r0 + /2.

Let be any compact subset of the real line such that [0, r + /2] c . Then,
due to the uniform convergence of the empirical characteristic function to the
corresponding characteristic function on each compact set (Theorem 3.2.1),
there exists an N\ such that, with probability as close to one as desired, for all
n>Nu
U(t) - < Un(t) < U{t) +

uniformly in t e B. Hence, with probability as close to one as desired, for all


n>Nu
r0-|<<r0 + |.

Also, by the strong law of large numbers, there exists an N2 such that, with
probability as close to one as desired, for all n > N 2

ma < + ,

where
oo
roo

/ -00 \x\adFM.

Therefore, with probability as close to one as desired, for > = m a x j i V i , ^ } ,

Unit)
nK) 1"" }
1 a
: 0<t <Rn } .
2 ~ ma

But > 0 by the choice of so that with probability as close to one as desired,
for all k > [(ro /2)/] + 1 ([] is the integer part),

\Tn,k n| <


The theorems below give some results concerning the distribution of Rn
depending on the behavior of the rea\ part of the underlying characteristic
function. The proofs of the theorems and some additional results are contained
in (Heathcote & Hsler, 1990; Brker & Hsler, 1991).
3.4. Parameter estimation 187

THEOREM 3 . 3 . 3 . / / > o is a finite isolated zero, u'(t) < 0 at t = ro, and


oo
dF(x) < oo
/ -oo I*

for some a > 0, then

(Rn < rn) - () as > oo

for every e (00,00), where

rn = r 0 + z. 2 = (1 + u(2t) - 2u\t)),

anc? () is the standard normal distribution function.

THEOREM 3.3.4. Let u(t) = exp{-/3|i| a } for some > 0, 1 < a < 2. Then

(Rn < rn) ( ) as > oo,

for > 0, where


l/a
log / _ 2 log - log 2
rn =
2)3 ^ log )\

THEOREM 3.3.5. Lei u(i) = exp{-/3|i| a } for some > 0, 1 < a < 2. Then, for

for some constant > 0, and large enough.

THEOREM 3.3.6. Lei u(i) = exp{-/3|i| a } with 0 < a < 1, > 0. Then

( 2 \1/a
.R 1 as > oo.
\logny

Note that in Theorems 3.3.4-3.3.6 we have ro = 00.

3.4. Parameter estimation


In this section, we present several procedures of parameter estimation based
on empirical characteristic functions. Naturally, we start with estimating
parameters of the stable laws, since this problem was the starting point of sta-
tistical inference using empirical characteristic functions. Only the univariate
case is studied in this section.
188 3. Empirical characteristic functions

Stable distributions are of interest from the viewpoint of many applications.


They are the only possible limiting laws for sums of independent, identically
distributed random variables. They have been applied in astronomy to model
gravitational fields by (Holtsmark, 1919). In (Mandelbrot, 1960; Mandelbrot,
1963) it was suggested to use stable laws as possible models for the distribution
of income and speculative prices in business and economics. More information
on applications of the stable laws can be found in monographs by (Zolotarev,
1986; Zolotarev & Uchaikin, 1999).
There are several ways of defining stable laws. For our purposes it will be
convenient to define them in terms of their characteristic functions. A (one-
dimensional) distribution is called stable if its characteristic function fit) can
be represented in the form
fit) = /(; , , , a)

- 0 0 < t < oo,


(3.4.1)
where

tan , 1,
(, ) = 2
-log|f|, = 1,

0 < a < 2, || < 1, > 0, oo < a < oo. The parameter a is called the
characteristic exponent, is a measure of skewness, is the scale parameter,
and a is the location parameter.
Let X be a random variable with the characteristic function of form (3.4.1).
Denote its distribution function and density by Six; , , , a) and s(x; , , , a)
respectively.
The inference problem with stable distributions is not straightforward; it
is complicated by the fact that their densities are not generally available in
closed form, making it difficult to apply conventional estimation techniques.
Due to this reason, the characteristic function-based estimation seems to be
reasonable for the stable laws.

3.4.1. Method of moments


The first method we consider in this section is a version of the method of mo-
ments proposed by (Press, 1972b). This is an analytic estimation procedure
which yields explicit estimators and involves minimal computation. Suppose
that a characteristic function (3.4.1) f(t;a, ,,) is known in four different
non-zero points t\, t2, tz, t. Then we have four equations for four parame-
ters from which we can derive expressions for the parameters via the values
f(ti), fit 2), fits), fit 4). The idea of thevmethod consists of replacing the val-
ues of the characteristic function in this expressions by those of the empirical
characteristic function.
3.4. Parameter estimation 189

Denote = (, , ,a). L e t X i , ...,X n be a random sample from distribution


function Six; ), and, as usual, fn(t) = fn(t; ) be the empirical characteristic
function associated with the s a m p l e X \ , ...,Xn.
Suppose t h a t 1 (the case a = 1 will be considered separately). It is
easy to see t h a t
\f(t;e)\=e-^a;
hence

y|*i| a = - l o g | / ( f i ) | ,
y | i 2 | a = log|/(i 2 )|

Solving these two equations simultaneously for a and y, and replacing fit) by
fn(t), we obtain the estimators

. _ log(- log |/ ^)|) - log(- log | / a q 2 ) | )


= (3 4 2)
log |ii| - l o g | i 2 | '
flog \h\log( log\fn(t 2 )\) - log 1^21 logC log |/n(l)|) 1
y e X P
" l log|fi|-log|f2| J (3.4.3)

for parameters a and y.


Now, let us estimate and a. Denote wit) = 3(log fit)). Then, from (3.4.1),
we obtain
wit) = at Y\t\a~lta>it, a),
and hence,

1
- y ^ r tan ^ = = 3,4. (3.4.4)
2 tk

Since
1 n 1 "
hit) = - V cositXj) + i - V sinitXj),
nr-f r-f
j= J=
in polar coordinates we have

fnit) = PnitV^,

where 2 2

pi)= U cos^)J + ( I sni<**;>


and
- sinitXj)
tan ) =
," cositXj)
190 3. Empirical characteristic functions

Hence,
log/(i) = Pnit) + ), wn(t) = 3(log/(*)) =
We choose the principal values of l o g k = 3,4, i.e., using principal values,
for t = 3 , 4 ,

sin(tXj)
wn(t) = arctan ^ - f (3.4.5)
> cos (tXj)
Replacing w(t), a and y in (3.4.4) by their estimated values given in (3.4.2),
(3.4.3), and (3.4.5), and solving the two implied linear equations simultane-
ously for and a, we obtain the estimators

a \wn(t3)/t3 - wn(t4)it4\

. \U\&~lwn(t3)lt3 - \H\*~xwn(U)lti
a = 1*410-1 - M 4 " 1 (347)

Turn now to the case a = 1. We have

m = e~ r
i.e., = log |/(f)|/|i|; therefore, for the parameter y the estimator is

r = (3.4.8)
I'll

Furthermore, for a = 1 (3.4.1) yields

w(t) = at log |i|,



i.e., for two non-zero values of t, say t<i and 3, 2 * 3,

a_2r]oeM k = 2,3. (3.4.9)


tk
Solving equations (3.4.9) simultaneously for and a, with w(t) replaced by
wn(t) and y replaced by y, we obtain the estimators for and a

o _ n[wn(t2)/t2 - wn(t3)/t3]
P 2y log 1*3/^21 '

log |3/2|
Thus, (3.4.2), (3.4.3), (3.4.6), and (3.4.7) yield moment estimators for the
parameters a, y, , and in the case of * 1; (3.4.8), (3.4.10), and (3.4.11) yield
3.4. Parameter estimation 191

moment estimators for , , and a when a = 1. The estimators are consistent,


because the empirical characteristic function is a consistent (in each point t)
estimator for the corresponding characteristic function, and the expressions for
the parameters via values /(*i), fit2), fit3), and fit4) are continuous functions
of these values.
As it was already mentioned, the version described version of the method
of moments is very simple to apply. On the other hand, there is the problem of
the choice of appropriate t values at which /(*) is to be evaluated.
Confidence intervals can be obtained for the parameters when large sam-
ples are available. We derive asymptotic distributions for the case of t h e
symmetric stable distributions centered around t h e origin. Thus, we assume
that a = = 0. In addition, we suppose t h a t 1.
Let, as usual, u(t) and vit) be the real and the imaginary parts of fit), and
unit) a n d vnit) be those of fnit). T h e n

Unit) = 1[fnit) + /(-*)], Vnit) = fn) ~ /(-*)],

and (3.4.2) and (3.4.3) can be rewritten as


L(ti)-Lte)
a =
1iTi (3.4.12)
log |fi I - log |f21
where

Lit) = log g(u2nit) + V2nit)) (3.4.13)

nog|* 1 |L(* 2 )-log|* 2 lL(* 1 )'l


y = eXP (3 44)
\ log |*i I log |2|
The estimators of a and are differentiable functions of the components of the
vector
Zn = iunitl),Unit2),Vnitl),Vnit2))
Since fit) is symmetric, i.e., vit) = 0, Z has the expectation
EZ = (/(*i),/(* 2 ), 0,0) = (, , 3, 4)
and the covariance matrix
= i^ij)ij=l,
where (see relations (3.1.4M3.1.6))

= ^-[l + fi2tj)-2f2itj)], 7 = 1,2,




= ^[1-/(2*,)], 7 = 3,4,

2 = 2 = ^ - [ / ( +1 2 ) + /(* - *2) - 2/(*)/(* 2 )],


2

C34 = 43 = ^ - [ f i t i - t 2 ) - f i t l + t2)],
2
192 3. Empirical characteristic functions

and Oij = 0 for all other i and j.


Denote
= .
Note t h a t un(t) and vn(t) are sample means of independent and identically
distributed random variables with finite variances, therefore, the central limit
theorem implies t h a t for fixed t\ and t<i,

Y/N(ZN - EZ) N(,), as oo,

where (,) is the normal distribution with expectation a and covariance


matrix (in the univariate case, the second parameter stands for the variance).
We introduce the random function

g(ZN) = A)IL(II) + (2UT2),

where L(f) is defined by (3.4.13). Then, from (3.4.12) and (3.4.14) we obtain

g( Z) = a

for
1
= -,-rr = , 0)2 = -)H = (21,
log|*i| - log|* 2 |
and
g(Z) = logy
for
log |*21 log I
0)1 = = Wl2 2 =
~ log |*i I log |2| ' 18|*1|-18|*222
From the large samples theory (see, e.g. (Rao, 1965, p. 321)),

y/n(a - N(0, \), as -> oo, (3.4.15)

and

v/r(logy - logy) ->iV(0, rjf), as oo, (3.4.16)

where
r\j = ?(, 2) = 2(, <y), j = 1,2,

and

2/ V- ^ ( , 2, 3, 4 )
(, *) = rn ^ , (3.4.17)
3.4. Parameter estimation 193

Now it only remains to evaluate the derivatives in (3.4.17). Since

g(61,02,03.04) = log - ^ l o g ^ + ef) + (1)2 log iog(0| + |)

we obtain
^(01,02,03,04)^ (Oj
J
50, /(*,) log/(*,)' ' '
and
dgWi, 02, 03,04) . . . .
= 0' / = 3'4

Substituting the covariances and derivatives into (3.4.17) and simplifying, we


transform (3.4.15) and (3.4.16) into

s/n ( ^ p - ) ^ N(0,1), as -> oo, (3.4.18)

y/n ^ MO, 1), as > oo, (3.4.19)

where
2 1 + /(2)| - 2|/ ()| 2 1 + |/(2f 2 )| - 2|/(i 2 )| 2
m = 2(|/ ()| log |/ ()| log M 2 I ) 2 2i\fM\ log |/ ( 2 )| log \txlt2\?

|/(*1 + t2)I + |/B(*1 - *a)| - 2|/ B ft 1 )||/ B (* a )|


(3.4.20)
\fn(h)\\fn(t2)\log |/(ii)| log |/(i 2 )|(log |*l/*2|)2 '
and
.2 (l + |/n(2fi)|-2|/^1)!2)(logl<2|)2 + (l + |/n(2i2)|-2|/n(i2)|2)(log|i1|)2
= Ml [ r u Ml u u U2
2(|/(ii)| log |/(fi)| log |!/ 2 |) 2 2(|/( 2 )| log |/( 2 )| log |/ 2 |) 2
_ (\fn(h+t2)\ + I/(*! - *2)| - 2|/ n (f 1 )||/, l (i 2 )|)log|ti|logical . ,
2
|/()||/(2)| log |/()| log |/(i 2 )|(log |/ 2 |) ' "'
Equations (3.4.18M3.4.21) provide us with the required asymptotic distribu-
tions for a and f). Thus, if /2 denotes the /2-significance point for a standard
normal variate, with confidence coefficient 1 , it follows for large samples
that
/2 /2
a -=- <a< a+ -=-,
\/n y/n
and
-/22\ . /22
exp < = > < y < y exp
y/n J l
As mentioned, the described version of the method of moments is very
simple to apply. On the other hand, there is the problem of the choice of
appropriate t values at which fn(t) is to be evaluated. There are some other
disadvantages of the method, in particular, as pointed out in (Paulson et al.,
1975), it sometimes yields impossible results like a < 0 or || > 1.
194 3. Empirical characteristic functions

3.4.2. Projection methods;


the integrated squared error estimator
Now we consider the general case (not only the stable laws). Consider a class
of methods of estimation based on the following idea. Let Fix; ), e c Mm,
be some parametric family of distribution functions, fit; ), e , be the
corresponding family of characteristic functions, and let pi, ) be a distance
in the space of characteristic functions. Suppose that Xn is a random
sample from the distribution function Fix; ), where % e , and fn(t) is
the empirical characteristic function associated with the sample X\, ...,Xn.
The estimator in question is defined as that = from which minimizes
p(f(t; e\fn(t)>.

p(f(t; en),fn(t)) = minp(f(t; ),/()). (3.4.22)

Since in the metric space of characteristic functions with metric pi, ),


fit; ) is the projection of fn(t) onto the set {/(; ), e }, these methods
of estimation are called projection methods. Let us prove a general result
concerning the consistency of projection estimators based on the empirical
characteristic function.

THEOREM 3.4.1. Let a family fit; ), e and a metric satisfy the conditions

(i) the convergence p(f{t\ 9n),f(t; )) > 0 as oo (, \, 02, e ^ implies


as oo;

(ii) on the set of characteristic functions, the p-convergence is weaker than


the uniform convergence on each bounded interval.

Then the projection estimator Qn defined by (3.4.22) is strongly (almost surely)


consistent.

PROOF. W e have

P(f(t; en),f(t; 0 )) ^ Pifit; ),/()) + (Afnit), fit, 0))


<2(/ (),/(; 0 )).

The right-hand side almost surely converges to zero as > oo by virtue of


Theorem 3.2.1; therefore

pifit; ), fit; )) a 4 0, oo,

and hence
> oo.

3.4. Parameter estimation 195

Note that, in particular, the first condition of the theorem is satisfied when
is a compact set, and Fix; ) is uniquely determined by , that is, Fix; \)
F(x; 2) if \ * 02. Or this is so when is not compact but the limits of the form
1||||- fit; ) and lim_> fit; ), where g , either do not exist or are not
characteristic functions (the case of stable laws).
Two examples of distances satisfying the conditions of the theorem are
given below. These are the weighted uniform distance

pi<pit), )) = s u p ;()|() - y ( f ) |
t

where wit) is a bounded positive function such that lim^^oo wit) = 0, and the
weighted Lp -metric

1lp
picpit), )) = \ I (pit) - wit)\PdGit) p> 0,

where Git) is an increasing function of bounded variation.


The projection estimator based on the minimization of the integrated
squared error
'00
/ -00 I f i f , ) - fnitfdGit)
is especially important and widely used. Consider it in more detail. By virtue
of Theorem 3.4.1, it is strongly consistent under some rather weak conditions
on the family Fix; ), e . Let us establish its asymptotic normality. Assume
that Inie) can be differentiated with respect to under the integral sign.
With the prime denoting differentiation with respect to , the estimating
equation is (as before, uit; ), vit; ), un(t) and vn(t) are the real and imaginary
parts of fit; ) and those of fn(t))
oo
/ --00
0 0 [iunit) - uit; e))u'it; ) + ivnit) - vit; Q))v'it; )] dGit)
' roo
= / [ ( c o s i t X j ) - u i t ; e))u'it; 0 ) + ( s i n i t X j ) - v i t ; Q))v'it; Q)]dGit) = 0.
n
j= 1 J - (3.4.23)

The estimator is a root of (3.4.23) for which '^() > 0.


Denote
oo
/ -00 [(cos(i^) - uit; Q))u'it; ) + ( s i n i t X j ) - vit; 6))v'it; )] dGit).

Then
2 *
n
>i
196 3. Empirical characteristic functions

where (), ...,() are independent and identically distributed random vari-
ables with variance
oo poo
/
/ [cov(cos(iX), cos(sX))u'(t; e)u'(s; )
-oo J oo
+ 2 cov(cos(iX), sin(sX))u'(t; 0)t/(s; )
+ cov(sin(iX),sin(sX))i/(f; 0)i/(s; )] dG(t)dG(s), (3.4.24)
where, as it follows from elementary trigonometric identities or from (3.1.4)-
(3.1.6) for = 1,
cov(cos(iX), cos(sX)) = [u(t + s; ) + u(t - s; ) - 2u(t; 0)u(s; )],

cov(cos(tX), sin(sX)) = \ [( + s; ) - v(t - s; ) - 2u(t; 9)v(s; )],

cov(sin(tX), sin(sX)) = \ [u{t - s; ) - u(t + s; ) - 2 v(t; d)v(s; 0)].

From the central limit theorem we obtain

y/nl'n(6) ^ N(0,4 Var (0)\ oo, (3.4.25)

provided that Var () < oo. We set


oo
/ -oo \f'(f,e)\2dG(t). (3.4.26)
Suppose that f(t; 0) is two times differentiable with respect to 0 and the func-
tions u"(t; ) and v"(t; 0) are uniformly bounded by functions which are inte-
grable with respect to G. In this case the second derivative of () is
oo
/ -oo [(/(;)) 2 + (/(;)) 2
- (un(t) - u{t- e))u"(t; ) - (vn(t) - Vit-, 0))/'(; )] dG(t);

hence oo
/ -oo I fit; 0O)| 2dG(t) = 2(0).
By the strong law of large numbers, this yields

'() 2(0), oo. (3.4.27)


Let us use the Taylor expansion
') = '(0) + - 0) + ( - ))

with || < 1. By (3.4.23), this yields

From (3.4.25), (3.4.27), and (3.4.28), we obtain the following assertion.


3.4. Parameter estimation 197

THEOREM 3.4.2. Let /() be differentiable with respect to under the integral
sign, and let functions u"(t; ) and v"(t; ) be uniformly bounded by some G-
integrable functions. Then

\/( - 0 ) (, as

where Var (0) and (0) are defined by (3.4.24) and (3.4.26).

3.4.3. An estimator of the scale parameter


In conclusion of this section, we briefly consider a method of estimating the
scale parameter developed by (Markatou et al., 1995; Markatou & Horowitz,
1995). Assume thatX\, ...,Xnisa random sample from a distribution function
F (^j^) where F(x) is a known absolutely continuous distribution function,
and and are unknown location and scale parameters. The characteristic
functions of F(x) and F {^r^J a r e denoted by fo(t) and /(), respectively, and
the empirical characteristic function associated with the sample X\, ...,Xn is
denoted, as usual, by fn(t).
The estimator is based on the idea that the absolute value of the character-
istic function is invariant to location and is qualitatively similar to a density
function with scale proportional to l / . Therefore, a measure of 'width' of the
modulus of the empirical characteristic function can be used to estimate .
We have |/()| = |/()|. Since F is assumed to be absolutely continuous,
i.e., |/0(f)| > 0 as \t\ - oo, |/o(0| takes all values between 0 and 1. Given any
fixed c, 0 < c < 1, denote

ioc = i n f { i > 0 : |/0(f)| = c},


tc = i n f { i > 0 : |/(i)| = c } .
Then
tc = toe/
i.e.,
= toc/tc.
Since F is supposed to be known, ioc is supposed to be known as well; there-
fore, the problem of estimating is reduced to the problem of estimating tc.
Moreover, if tc is a consistent estimator of tc, then is estimated consistently
by
= toc/tc-
The problem of estimating tc is similar to the problem of estimating the
first positive zero studied in Section 3.3. The results are also similar, namely,
it is reasonable to estimate tc by the estimator

tc = i n f { < > 0 : |/(f)| = c}.


198 3. Empirical characteristic functions

Then the proposed estimator of is

= toc/tc.

THEOREM 3.4.3. Let the conditions

(i) F(x) is absolutely continuous;

(ii) tc e B, where is a known compact set;

(iii) |/()| decreases in some neighborhood of tc

be satisfied. Then
> as > oo,
i.e., is an almost surely consistent estimator of .
Details can be found in (Markatou et al., 1995; Markatou & Horowitz,
1995).

3.5. Non-parametric density estimation I


Due to the close and simple relationship between characteristic functions and
the corresponding densities (the inversion formula for densities, the Parseval-
Plancherel identity), many concepts of non-parametric density estimation can
be expressed and many results can be obtained by means of characteristic
functions with no less success than by means of density functions. Moreover,
although fundamental results of the modern theory of non-parametric density
estimation have been obtained without use of the characteristic function, in
many cases the approach based on the characteristic function possesses many
advantages compared with other methods.
In this and the following sections, we study a field of non-parametric den-
sity estimation where the use of characteristic functions seems to be almost
inevitable, at least at the moment. This is the derivation of sharp bounds for
the error of estimation for finite values of the sample size n. These bounds
are particularly useful when one needs to know the guaranteed accuracy of an
estimate for a given (finite) value of n.
We mainly deal with one class of non-parametric density estimators
kernel estimators (or some their generalizations). These estimators are most
popular today and probably most effective. In this section, we consider only
conventional kernels, that is kernels which are probability density functions.
This ensures that each realization of the corresponding estimator is a prob-
ability density. Other kernels have been rarely used because it is regarded
that 'a density should be estimated by a density* (see, e.g. (Devroye & Gyrfi,
1985)). Not contradicting this point, we however believe that non-conventional
kernels should be used (since they provide more accurate estimates), and the
3.5. Non-parametric density estimation I 199

defect of the corresponding estimators should be corrected afterwards, and in


such a way which does not make worse their approximating properties worse.
This problem is considered in the following section.
First we express the basic concepts of the kernel density estimation in
terms of characteristic functions.
Let X\,..., Xn be independent and identically distributed random variables
with absolutely continuous distribution function F(x), density function pix),
and characteristic function f(t). The kernel density estimator associated with
the sampleXi, ...,Xn is defined as
4 / \
()=()=^-^\, (3.5.1)
>1
where K{x) is a measurable real-valued function called the kernel (for the time
being, we do not assume that K(x) is a density function; this restriction will
be introduced a little later), and h = hn is a positive number (depending on n)
called the bandwidth or the smoothing parameter. Generally speaking, K(x)
is not supposed to be integrable (moreover, the best approximations often cor-
respond to non-integrable kernels). However, we suppose that K(x) is square
integrable. In addition, we will assume that K(x) is integrable in the sense of
the Cauchy principal value (see page 6) and
oo
K{x)dx = 1.
oo

Under these assumptions, the Fourier transform of K(x) can be defined as


oo

/ -oo eltxK(x)dx
(see (Titchmarsh, 1937, Chapter 4))
Now we introduce some basic characteristics of a density estimator which
are of frequent use and which we will deal with below. Let pn(x) be an esti-
mator (not necessarily a kernel estimator) of p(x) associated with the sample
X\ Xn. The bias of n(x) is (we try to make notations close to those used in
the literature on density estimation)

( ) ) = () - p{x). (3.5.2)

In the case of the kernel estimator pn(x) defined by (3.5.1), the bias is written
as
Bn(pn(x)) = (Kh*p)(x)-p(x),
where

(3.5.3)
-H)
Kh(x) = T K [ T ) ,
200 3. Empirical characteristic functions

and iKh * )() is the convolution:


oo
/
-oo Kh(x-y)p{y)dy.
Since convolution can be considered as some kind of smoothing, the bias of
the kernel estimator is the difference between the smoothed density and the
density itself.
The mean squared error of an estimator pn(x) is defined as
MSE(0C*)) = Elfinb) - p{x)]2. (3.5.4)
It admits a simple decomposition into variance and squared bias:
MSEOM*)) = B2n(pn(x)) + Var(p(x)). (3.5.5)
The mean squared error depends on and locally characterizes the deviation
of an estimator from a density to be estimated. Integrating MSE over all jc, we
obtain the integrated measure of the deviation of an estimator from a density
to be estimated, the mean integrated squared error
oo r oo
/ -oo MSEQM*)) dx = E J ooIpn(x) - p(x)]2dx. (3.5.6)
In view of (3.5.5),
oo poo

/
B2n(pn(x))dx + / Var (pn(x))dx.
-oo J oo
Along with MSE and MISE, other measures of the deviation are used.
Among them the mean absolute error
(()) = [ - ( * ) |
and, respectively, the mean integrated absolute error
oo
/
-oo \pn(x)-p(x)\dx
are especially important (see (Devroye & Gyrfi, 1985)). However, in this book,
we will consider only MSE and MISE.
In the case of the kernel estimator, the variance, MSE, and MISE of an
estimator can be easily expressed in terms of the function K^ix) defined by
(3.5.3) and a density to be estimated:
V a i i p n ( x ) ) = ~[(K2h *p)(x) - (Kh *p)2(x)],

M S E ( p n 0 t ) ) = -l(K2h *p)(x) - (Kh *p)2{x)} + [{Kh *p)(x) - p(x)l2,



1 f
MISE(pn(x)) = - / [(K2h * p)(x) - (Kh * p)2(x)] dx
J-oo
oo
/ -oo [(Kh*p)(x)-p(x)]2dx.
3.5. Non-parametric density estimation I 201

Let g(x) be a real-valued function. We use the notation


oo

/
\x\kg(x)dx, k = 1,2,...,
-oo
oo
/ -00 g*ix)dx,
provided that these integrals exist. If the kernel Kix) is a probability density
function, and a density to be estimated is two times differentiable and its
second derivative is square integrable, then the asymptotic relation for MISE
(see, e.g. (Wand & Jones, 1995))
inf MISE(p n (*)) ~ f Ui%(K)R4(K)R(p")] -> oo.
A>0 *

Thus the best order of approximation is n~ 4/5 if only density functions are used
for the kernel. However, if we permit the kernel not to be a density, then the
order can be improved. For example if p(x) is the normal density and K(x) is
the sine kernel, i.e., K{x) = sin(/cc)/(ra:), then

mf MISE(/>Cc)) = , /i oo.

Now we express some basic characteristics of density estimators in terms


of Fourier transforms and establish some auxiliary results.
Let fnit) denote the Fourier transform of an estimator pnix). Making use
of the inversion formula for densities, the Parseval-Plancherel identity, and
relations (3.5.2), (3.5.4), and (3.5.6), we easily obtain the following:
1 f
Bnipnix)) = Z- / e-ltx[Efn(t) - fit)] dt, (3.5.7)
J-oo
2
r
MSEipnix))= I J e~ltx[fnit) - fit)]dt
1 poo rroo
o_
= / / e~iiu+u)xE [(/() - fiu))Cfniv) - fiv))] du dv,
() J-oo J-oo (3.5.8)
1 f
MISE(p(x)) = / E|/(i) - fit)\2dt. (3.5.9)
J-oo

In the remainder of this section, we will consider only kernel estimators


and suppose that the kernel K(x) is a probability density function, i.e., it is non-
negative and integrated to one. We derive some upper bounds for MISE and
MSE depending on the choice of a kernel and upon an appropriate choice of the
bandwidth. Naturally, the results depend on the degree of smoothness of an
estimated density; therefore we separately study several cases. As usual, fnit)
denotes the empirical characteristic function associated with sample X\, ...,Xn.
202 3. Empirical characteristic functions

The general form of the kernel estimator (3.5.1) in terms of the empirical
characteristic function is

Pnhix) . 2i fJ- t
e-ltxfn(t)(p(hnt)dt,

where is the characteristic function of the kernel


c
)= /f
J(
eltxK(x)dx.

The characteristic function of the estimator pnh(x) is /()().


For the kernel estimator, taking into account that

fn(u)fn(v) = ~f(u + V) + f(u)f(v) - -f(u)f(v),



|/2 = - + |/2--|/()|2,

we can rewrite (3.5.7M3.5.9) as
1 f
BUPnix))
n = / e-uxf(t)(cp{hnt) - 1) dt, (3.5.10)
An J-co
1 r OO POO
i(u+v)x
MSE I P M I . ^ J L -(p(hnu)(p(hnv)f{u + v)

+ 1 1 1 - - ) (p(hnu)<p(hnv) - 2<() + 1) f(u)f(v) du dv


(3.5.11)
1 f

MISE(pW) = I J \f(.t)\2\l - <p{hnt)\2dt


(3.5.12)
+ -1 / f |()| 2 (1 - |/(i)| 2 )di .
Joo
From ( 3 . 5 . 1 0 ) we immediately obtain
(3.5.13)
1 / f |/(f)||l - <p{h t)\ dt.
\Bn(pn(x))\ < J-oo n

LEMMA 3 . 5 . 1 . If p(x) is bounded:

s u p p ( x ) < a < oo,



then
1 l2
sup MSE(p n (x)) < / |/(f)||l - qKhnt)\dt + - 7 / |()| dt.
x \_ J-00 J J-00 (a <
(3.5.14)
3 . 5 . Non-parametric density estimation I 203

PROOF. Making use of relation (3.5.11), we obtain

1 f l2
MSE(pn(x)) = I J e~Uxf(t)( 1- (p(hnt))dt

1 1 f f
+ - / / + v)dudv
(2)^ J-oo J-oo
1 1 "00 /
^ I
(2 J-oo / 7-00 e~l(u+v)x(p(hnu)(p(hnv)f(u)f(v)dudv.

The first term on the right-hand side is dominated by the first term of the
right-hand side of (3.5.14). Let us estimate the absolute value of the second
(denoted by Ti) and third (denoted by 3) terms. We have

1 1 00 1
2 = - / (P(hnu) / e-iiu+v)x(p(hnv)f(u + ) dv du.
nlnj-oo linj-oo

The term in the square brackets, being transformed to the form ^ f ^ e ltx
(( - u))f(t)dt, is equal to f^pix-y)Kh(y)e-iu*dy (since cp(hn(t - u))f(t) is
the Fourier transform of the convolution of the functions p(x) and Kh(x)e~lux),
and we have
OO I poo

/
p{x - y)Kh(y)e'iuydy\ < p{x - y)Kh(y) dy
-00 I J00
oo

/ -00 Kh(y)dy < a.


Hence,
a 1 f
|T2|<- / \cp(hnt)\dt.
J00

Furthermore,

I f 00 . lf
ITsI = I / e-lux<p(hnu)f{u)du / e-wxf{.v)(p{hnv)dv
2, J-oo Aft J-00
1 1 r . If 0 0
sup I e-luyf{u)q>(hnu)du / |/()||(,)| do
y In J00 J-00
1 1
< - suptK/, * p)(y) / |(| <i
/ > y_oo
1 /
< - / |(;)| du
2 y_oo

(we used the obvious inequality supyCK/, * p)(y) < sup^pty)). Thus we finally
arrive at (3.5.14).
204 3. Empirical characteristic functions

LEMMA 3.5.2. The bound


oo 1 reo
MISE(pnA(x)) <
2 / -oo |/()|2|1 - (p(.hnt)\2dt + nh
- rn Joo\cp(t)\2dt
is true.

The lemma immediately follows from relation (3.5.12).


First we study the 'smooth' case where a density to be estimated is one or
several times differentiable.

THEOREM 3.5.1. Let p(x) be twice differentiable, and p"(x) be a function of


bounded variation: oo
V ip") = v2<oo.
1/5
If the kernel K{x) has zero expectation, and hn = h^n (ho is some constant),
then

MISE(p(*)) < ^ f ^ h t + ^ p j '5. (3.5.15)

PROOF. By virtue of Theorem 2.5.4, we have

we|s
(v^. 13
and by virtue of Theorem 2.3.3,

rix - ^
for all t. Hence,
II f " e t / ; $
. ,3.5.17)
5
Further, using the Parseval-Plancherel identity (Theorem 1.2.10), we obtain
1 r 1 1 r
/ \<f>(.t)\2dt=-n~m \<p(t)\2dt (3.5.18)
nnn J-oo J-oo
= in-4/5 r K2(x)dx = W .4/5
J-oo ho (3.5.19)

From (3.5.17), (3.5.19), and Lemma 3.5.2, we obtain (3.5.15).


3.5. Non-parametric density estimation I 205

COROLLARY 3.5.1. Let p(x) be twice differentiable, p"(x) be a function of bound-


ed variation, and K(x) have zero expectation and be bounded:

sup-Kfa) < b < oo.


X

Ifhn = h0n~y5, then

COROLLARY 3.5.2. Let the conditions of Theorem 3.5.1 be satisfied. Then for
each n = 1 , 2 , . . . ,

mf MISE(p(*)) < [$(K)R\k)" 2


/1/3 4/5
n

The approximate value of the absolute constant at the right-hand side is

/ 3 54 \ 1/5

( w j -10311368

If p(x) is only one time differentiable or/and the expectation of K(x) does
not equal zero, then results are weaker.

THEOREM 3.5.2. Let p(x) be differentiable, and p'{x) be a function of bounded


variation:
oo
V ip') = V ! < OO.

Ifhn = hon y3, then

MISE(p n (*)) < (J-tf{K)Vh20 + n" 2 / 3 . (3.5.20)

PROOF. By virtue of Theorem 2.5.4 and the remark after Theorem 2.3.3,

1/(01 < J 1 ' l ^ ^ 2 '

and
|1 - ()| <i(K)hn\t\
for all t. Hence (see the proof of Theorem 3.5.1),
OO Q

/ -oo |/(i)|2|l - (p(hnt)\2dt < "J


%f(K)Vf/2h20n-2/3. (3.5.21)
206 3. Empirical characteristic functions

And, as in the proof of Theorem 3.5.1,

Wt)\2dt = - * 3 . (3.5.22)
nnn J-oo ho

From (3.5.21), (3.5.22), and Lemma 3.5.2, we obtain (3.5.20).


COROLLARY 3 . 5 . 3 . Let the conditions of Theorem 3 . 5 . 2 be satisfied. Then for
each = 1,2,...,
/ Q \ 1/3
inf MISE(p(*)) < - {K)^/v[[R{K)} m n- m .
A>o \nj

Theorems 3.5.1 and 3.5.2 provide us with bounds for the integrated devia-
tion of the mean squared error of a kernel estimator from zero. Now we obtain
bounds for the sup-deviation. Denote
r
B(K) = / \<p{t)\dt.
Iii J-oo

THEOREM 3.5.3. Let p(x) be three times differentiable, p"'{x) be a function of


bounded variation:
oo
V (p"f) = V3 < oo,
oo
and
supp(x) < < oo.
X
Ifh n = h0n~V5, then

supMSE(pn(x)) < ( ^ M f C t t O o + "4*

PROOF. By virtue of Theorem 2.5.4,

1*1 * v34'

and, by virtue of Theorem 2.3.3,

for all t. Hence


oo r dt

/
|/(0| 11 - ()| < 2(K)hl / t2dt + V3 -2
-oo [Jo Jv tz

To obtain the result, now it suffices to apply Lemma 3.5.1.


3.5. Non-parametric density estimation I 207

COROLLARY 3.5.4. Let the conditions of Theorem 3.5.3 be satisfied. Then for
each = 1,2,...,
1/5
/ \
inf sup MSE(p(x)) < 4 %15(K)Vi'10B4/5(K)amn~^.
h> \9J

THEOREM 3.5.4. Let p(x) be twice differentiate, p"{x) be a function of bounded


variation, and
s u p p ( x ) < < oo.
X

Ifhn = hon~y3, then

supMSEipn(x)) < ( ^ l i K W f h l + n~m. (3.5.23)

PROOF. Use ( 3 . 5 . 1 6 ) and the inequality from the remark after Theorem 2 . 3 . 3 :
|1-()|<(#)||.

Then

fV? f dt
|/(f)||l - q*hnt)\ dt < 2 ( K ) h n / tdt + 2 2 ( K ) h n / 1 3
-oo Jo Jvj t*
3
= 3 (K)V$ hn = {)- .

Using this bound and Lemma 3 . 5 . 1 , we arrive at ( 3 . 5 . 2 3 ) .

COROLLARY 3.5.5. Let the conditions of Theorem 3.5.4 be satisfied. Then for
each = 1,2,...,
/ q \ 1/3
inf sup MSE(p n (*)) < 3 f ) ? / 3 (^) 2 4 / 9 2 / 3 (^) 2 / 3 " 2 / 3 .
> \4*J

Now we consider the so-called non-smooth case. This means t h a t the


underlying density function is not assumed to be differentiable and even con-
tinuous. Some regularity conditions, however, must be introduced (otherwise
nothing substantial can be derived). Throughout this and following sections,
this condition will be the boundedness of the total variation of t h e underlying
density. Note t h a t this condition is a little less restrictive t h a n those usually
assumed in the non-smooth case (see, e.g. (van Eeden, 1 9 8 5 ; van Es, 1 9 9 7 ) ) .

THEOREM 3.5.5. Let the underlying density p(x) be a function of bounded vari-
ation: V = Yip) < oo. If
h =
" y/n\ogn
208 3. Empirical characteristic functions

then

log2 4v/2
MISE(Pa(*)) < ^-3{^/(\()}
y/n
R(K)
max{V 3/2 , V 2 } max{\/,o} + (3.5.24)
h0log
provided that > ee.

PROOF. Let us use Lemma 3.5.2. For the second term in the square brackets,
by the Parseval-Plancherel identity, we obtain
I ro Ojr r oo 2 nR(K)
- Witfdt = / K\x)dx = (3.5.25)
nhn J-oo nhn J- nh
Let us estimate the first term. First establish the following inequality: for any
0 < < 1,
|1 - (p{t)\ < () 2 1- || (3.5.26)
for all real t. Indeed, in view of the remark after Theorem 2.3.3,
|1-()| <(#)||. (3.5.27)
For |f I < 2 1 (), the right-hand side of (3.5.26) majorizes the right-hand side
of (3.5.27); therefore (3.5.26) is true for these t. If \t\ > 2^\), then (3.5.26)
becomes obvious because its right-hand side exceeds two.
Let be arbitrary, 0 < a < 1/2. Making use of (3.5.26) and Theorem 2.5.3,
we obtain
oo
/ -oo |/()|2|1 - ()| 2
fV roo
= 2 / |/(i)|2|l - (p(hnt)\2dt + 2 / |/()|2|1 - ()|2
Jo Jv

< 2 2 ()2 2 ( 1 _ ) 2 ^ t2adt + V2 t2a~2dt

2 4-2

~ a-4a^fl(K)2aV2a+lh2"
From this bound and (3.5.25), using Lemma 3.5.1, we obtain
23" 2 a
MISE(p n A (*)) < -j^--fa(K)V2a+1h2a + ^
(1 - 4 2 ) nhn
232 1 2a
{)2+
(1 - 4 2 ) y^log .
R(K) log
(3.5.28)
ho y/n
3.5. Non-parametric density estimation I 209

for any (0,1/2). Set


logra
a =
2(log + 2 log log n)'
then 1/4 < a < 1/2 (provided that > ee), and hence

2 3 - 2 < 4\/2, () < max{ ^ i ( i f ) , ()},

2 + 1 < max{V 3/2 , V 2 }, < max{v^,0},

therefore from (3.5.28) we obtain

MISE(p A (x)) < ^ max{ ()} max{V 3 1 2 ,V2}max{h0}


2a
R(K) log
(3.5.29)
1 - 4 a 2 .V^logn.

Set
log 2 log log
ao =
2(log + 2 log log n)'
then a > ocq (if > ee); hence

1 2a
1 2a log
(3.5.30)
,-y/nlogn. ,\/nlogn. y/n

It remains to estimate 1/(1 4a 2 ). We have

1 (log + 2 log log n)2


1 4a 2 (log + 2 log log n)2 (log n)2
(log + 2 log log )2
(2 logra+ 2 log log n)2 log log
1, . log log
< - log + 1 + 5 < log (3.5.31)
4 logn + loglogn
if >ee.

From (3.5.29), (3.5.30), and (3.5.31) we finally obtain (3.5.24).

COROLLARY 3.5.6. Let p(x) be unimodal density function, and


suppOr) < a < oo.

If
hp
K =
y/nlogn'
210 3. Empirical characteristic functions

then

MISE(p nA (*)) < ^ ~ ~ ~ max{ /, \()}


V

max{2\/2a 3/2 ,a 2 }max{ v / ^,o} +


/lologra.

3.6. Non-parametric density estimation II


In this section, we continue studying the applications of the characteristic
function approach to the problem of non-parametric density estimation. We
consider the same problem as in the previous section: the derivation of upper
bounds for the error of approximation, but now we study a kernel which is not
a density function and even is not non-negative and integrable.
In kernel density estimation, there is a number of methods based on
high-order kernels, superkernels, sine kernel, semiparametric methods, etc.
(Parzen, 1962; Bartlett, 1963; Davis, 1975; Devroye, 1992; Hjort & Glad,
1995; Glad et al., 1999c; Wand & Jones, 1995) which are good in many re-
spects, but have one essential disadvantage: they produce estimates which
are not probability density functions i. e. may take negative values or/and do
not integrate to one. This fact is quite disappointing because it leads to some
limitation in application of these estimators in practice, although they usual-
ly give better approximation than conventional estimators whose kernels are
densities. This is particularly true for the estimator based on the sine kernel.
For this reason, we start with some methods of modification of density
estimators (all estimators, not only kernel estimators) in such a way that
the resulting estimator always produces estimates which are almost surely
probability density functions, and, in addition, the resulting estimator is better
or at least almost as good as the initial one (it is always better for high-order
kernel, superkernel and sine kernel estimators). Saying 'better' we mean
MISE.
Thus, let X\,..., Xn be independent and identically distributed random vari-
ables with absolutely continuous distribution function F(x), density p(x) and
characteristic function /(), and let pn(x) be an estimator o f p ( x ) associated with
the sample ...,Xn. Suppose that all or some realizations of pn(x) may not
be probability densities, i.e., can take negative values or/and do not integrate
to one. Our objective is to correct these defects of pn(x), i.e., to construct a
transformation p*0c) of pn(x) which is free of the defects mentioned above and
satisfies the condition

MISE ( ) < MISE(p(x))

or, at least, the condition

MISE(p*(x)) ~ MISE(p(x)), oo.


3.6. Non-parametric density estimation II 211

Before constructing new estimators, let us make the following remark. The
most tempting way of overcoming the troubles related to a possible negativity
and non-normalization (or even non-integrability) of an estimator seems to be
the replacement of the estimator by its projection onto the set of all probability
densities (or onto some subset of this set, if some preliminary information
about a density to be estimated is available). In other words, let & be some
space of functions the estimator under consideration almost surely belongs to,
and let 9 be the set of all probability density functions. Suppose that 9 c J2",
and & is a metric space with metric p(,). The estimator pn(x) is replaced by
the estimator () such that for any realization of pn(x), the corresponding
realization of (x) satisfies the conditions

(1) pfa) 9\

(2) p(pn,p^) < p(pn,q) for any q(x) e .

After this replacement we obtain an estimator which is always a density,


and which does not loose much in the rate of convergence to the estimated den-
sity p(x) compared with the initial estimator pn(x), because for each realization
of pn(x)
P(Pn >p) ^ PiPn'Pn) + ,) ^ ,)
However, there is a serious trouble in direct applying the projection ap-
proach: given a realization of pn(x), it is not clear how to find its projection
onto 9 (to obtain a formula or to construct an algorithm). Therefore, it seems
reasonable to use certain 'approximations' of the projection method. It turns
out that some of these approximations give even better results than the 'unal-
tered' projection method.
Below, we separately consider the cases where
oo
/ -oo max{0,p(x)} dx > 1 (3.6.1)
and where
oo
/ -oo max{0,(:*:)} dx < 1, (3.6.2)

and construct for them the modifications pn(x) and pn(x) respectively. If pn(x)
almost surely satisfies one of relations (3.6.1) or (3.6.2), then p*n(x) = pn(x) or
p(x) = pn(x) depending on which relation holds. If some realizations of the
estimator pn(x) satisfy (3.6.1) while others satisfy (3.6.2), then we can use the
following combination of the estimators pn(x) and pn(x):

fp n (x), if (3.6.1) holds,


W*), if (3.6.2) holds.
212 3. Empirical characteristic functions

OO

/ -OO max{0,p n (*)} dx > 1


Assume that the initial estimatorp n {x) is almost surely bounded, square inte-
grable and satisfies condition (3.6.1) including the case
OO

/ -OO max{0,p(*)} dx = oo.


This case (condition (3.6.1)) seems to be more important than the case of the
converse inequality (3.6.2); in particular, it is satisfied for the sine kernel and
superkernels.
Our proposal for the new, modified, estimator is
pn(x) = max{0,p n (x) - }, (3.6.3)
where is chosen in such a way that
oo
I
pn{x)dx = 1.
OO
Below, we prove that, at least in the sense of MISE, the estimator pn(x) is
always better than the initial estimator pn{x). First, let us show that pn{x) is
well-defined, i.e., that always exists and is unique.

LEMMA 3.6.1. Let q(x) be a bounded, square integrable function. Then the
function
qE(,x) = max{0, q(x) - }

is integrable for any > 0.

PROOF. Denote
b = supg(x),
X
and
Ae = {x: q(x) > }.
Due to the conditions of the lemma, 6 < oo. It is easy to see that has a finite
Lebesgue measure. Indeed, if mesA = oo, then

q2(x) dx> I q2(x) dx > 2 mes AE = oo


/ -OO J

(mes denotes the Lebesgue measure) which contradicts the assumption that
q(x) is square integrable.
We have qe(x) = 0 if , therefore

qe(x) dx = I qe(x) dx<b mes < oo.


/ -OO J Ac

3.6. Non-parametric density estimation II 213

We set oo
/ -oo max{0, q(x) z } dx.
Let z\ > 22 > 0. We have

(zi - z 2 )mesA Z l < ( 2 ) - {\) ^ (zi - z 2 ) m e s A Z 2 ,

which yields the following assertion.

LEMMA 3.6.2. The function () is continuous for > 0 and strictly decreases
for 0 < < b.

Lemmas 3.6.1 and 3.6.2 imply that exists, and each realization of pn
uniquely determines the corresponding realization of , i.e., is well-defined.
By the definition of pn(x), each realization of it is a probability density
function. We prove now that the MISE of pn(x) (as an estimator of p(x)) is at
least as good as that of pn(x) for any n.

THEOREM 3.6.1. Let pn(x) be an arbitrary estimator of a density function p(x)


satisfying the conditions

(1) pn(x) is almost surely bounded and square integrable;

(2) almost surely


oo
/ -oo max{0,p(x)}dx > 1; (3.6.4)

this includes the case where


oo
/ -oo m a x { 0 , p n ( x ) } d x = oo.

Then for any n, almost surely,


OO poo
/ -oo{pn(x)-p(x)fdx< J/ oo(p n (x) - p(x)) dx,
2 (3.6.5)

where pn(x) is the estimator given by (3.6.3).

PROOF. Consider an arbitrary realization of pn(x). Without loss of generality


assume that this realization (we use the same notation pn(x) for it) is square
integrable and satisfies (3.6.4). If = 0 for this realization, then pn(x) =
max{0,p(x)} and for every x,

|(*) -p(x)\ < \pn(x) - p ( * ) |


214 3. Empirical characteristic functions

which implies (3.6.5), so, we will assume that > 0 for the realization consid-
ered. This means, in particular, that
oo

/ -oo max{0,pn(x)} dx > 1. (3.6.6)

Let us fix an arbitrary 0 < < 1. Choose a set Am (depending on ) on the real
line, having a sufficiently large Lebesgue measure > 0, so that

(3.6.7)

max{0,p(x)}cix > 1, (3.6.8)

RS U1 p^ pn(x) < (3.6.9)

(the last two relations are meant for the considered realization of pn (x); the
first of them is possible due to (3.6.6), the second one is possible due to the
square integrability ofp
Now, let us construct the sequence of functions {<?*()}, k = 0,1,2,..., as
follows:

(7o(*) = Pn(x), Qk(x) = max{0, qrA_i(jc)} - ck, k = 1,2,

where

i.e., Ck is a number such that

(3.6.10)

Note that ck > 0 for all k.


Let us prove that

k = 1,2,... (3.6.11)

Indeed, we obviously have

[ (max{0,?*_i(x)}-p(x))2&< [ (qk-i(x) - p(x))2dx. (3.6.12)

Consider the function


3.6. Non-parametric density estimation II 215

Its derivative is

^() = - 2 / (max{0,<7*_i(::)} - -p(x))dx


Jam
and is equal to zero for = c*, which implies that attains its minimum on
the half-line > 0 at = Ck, i.e.,

f (qk(x)-p(x))2dx< f (max{0,g _i(x)} - p(x))2dx. (3.6.13)


JAm JAm
From (3.6.12) and (3.6.13) we obtain (3.6.11).
Observe now that

f }
qk(x) + ck = max < 0,qr0(x) - J ^ c , > , A = 1,2,..., (3.6.14)

which can be proved by induction: this is obviously true for k = 1 and if this is
true for k - m, then

Qm+it + Cm+i = max{0,<7m(x)} = max{cm,gOT(x) + ^m J ^m

= max jc m ,max jo,g0(*) - ^ c, j | - cm

= max jcTO, q0(x) - ^ c , | - cm

r - \
= max < 0,<70(*) - ^ c , > ,

i.e., is true for k = m + 1.


Relation (3.6.14) implies that
oo

>1
because otherwise

lim sup / qk(x)dx < lim / max < 0,go(*) c, > dx = 0,


k->oc JAM JAm ^ ^ J

which contradicts (3.6.10) and (3.6.7). Setting k oo on both sides of (3.6.14),


we obtain

lim qk(x) = max < 0, g 0 (*) - Y) cj > ,


I U J
216 3. Empirical characteristic functions

where the limit is both in the uniform and L2 norms. Denote

( * ) = A->oo
lim qk(x).

We have

qrM(x) = m a x | o , 9 o W - ^ c i | ' (3.6.15)

and

pn(x) = max{0, 0(*) - } (3.6.16)

(for the considered realization o f p n ( x ) ) . By (3.6.9),

pn(x) = 0 for i e R 1 \ A m , (3.6.17)

hence

[ pn(,x)dx= f pn(x)dx = 1, (3.6.18)


Mm J00

while, due to (3.6.10),

f qM(x)dx= [ p(x)dx< 1; (3.6.19)


J AM J AM

therefore, (3.6.15) and (3.6.16) yield

00

>1

which implies that

qM(x) = 0 for x e l K A w , (3.6.20)

and therefore,

qM(x)<pn(x) (3.6.21)

for all x. Moreover, in view of (3.6.11),

f (qM(x)-p(x)fdx< f (q0(x) - p(x))2dx = f (pn(x) - p(x))2dx,


JAM J AM J AM
3.6. Non-parametric density estimation II 217

hence, taking (3.6.7) into account, we obtain

(<) - p(x))2dx = [ (?1 () - p(x))2dx + f p\x) dx


J-oo JAm J^^Am

[ (pn(x) - p(x))2dx +
JAm
oo
2
/ -oo(pn(x)-p(x)) dx + e. (3.6.22)

The transition from qM(x) to pn(x) is carried out by the passage to the
limit as 0, oo, and Am - M1. More exactly, let decrease, and
Am increase (the latter in the sense A ^ c Am- for < 2), and let 0,
> oo, and Am 1 (in the sense Uaz-^m = R ) in such a way that conditions
(3.6.7M3.6.9) are satisfied. Below, all lim's and lim sup's are taken as > 0,
> oo and Am > R 1 In view of (3.6.19), we have

lim f qM(x)dx = lim f (^(,x)dx = 1.


JAm J00

From this relation and (3.6.18), taking (3.6.17), (3.6.20), and (3.6.21) into ac-
count and setting s = supp(c) (by the conditions of the theorem, c < 00), we
obtain
oo poo

/ -00(pn (x) - (^{x))2dx < lim supJ00


/ ipn(x) - c^{x)){pn{x) + c^ix)) dx
poo poo
[ J/00
pn(x)dx J 00
= 0,

i.e.,
oo
/ -00(Pn(x) ~ ( A * ) ) 2 g & = 0. (3.6.23)

On the other hand, (3.6.22) yields


oo poo

/ -00(qr*(x) - p(x))2dx < J00


/ (pn(x) - p(x))2dx. (3.6.24)

From ( 3 . 6 . 2 3 ) and ( 3 . 6 . 2 4 ) we obtain (3.6.5).

COROLLARY 3 . 6 . 1 . Let the conditions of Theorem 3 . 6 . 1 be satisfied. Then for


any n,
MISE(pU)) < MISE(pn(x)).
218 3. Empirical characteristic functions

oo
3.6.2.
/ -oo max{0,p n 0e)} dx < 1
The results in the case (3.6.2) are not so good as in the case (3.6.1) but quite
sufficient for applications. The estimator pn{x) introduced below for the case
(3.6.2) is not 'always better' than the initial estimator pn{x) (as the estimator
pn(x)), but is 'almost as good as' pn(x).
Let pn{x) be an estimator of a density function p(x), and let (3.6.2) hold
almost surely. Assume that p n (x) is almost surely square integrable. Our
proposal for the estimator pn(x) (it is supposed to depend on a parameter
= M(n) > 0) is

Pn(x) =Pn(x;M)= < ) ' w- (3.6.25)


I max{0,(*)}, \x\>M,

where

All realizations of the estimator pn{x) are non-negative and, as it is easily


seen,
oo

/ -oo pn(x)dx = 1.

Let us demonstrate that, taking dependent on n, we can make the MISE of


pn(x) as close to that of pn(x) as desired, both asymptotically and for all finite
n. This follows from the theorem below.

THEOREM 3.6.2. Let pn(x) be an arbitrary estimator of a density function p(x)


satisfying the conditions

(1) Pn(x) is almost surely square integrable;

(2) almost surely,


oo

/
-oo max{0,/>(*)} cfoc < 1.
Then for any n, almost surely,

oo poo g

2 2
/ -oo(pn(x) - p(x)) dx < J-oo
/ (pn(x) - p(x)) dx + 2
,

where pn(x) is the estimator given by (3.6.25).


3.6. Non-parametric density estimation II 219

PROOF. We have
oo
L oo
(pn(x)


-p(x))2dx

= [ (max{0,p(x)} + r)M -p{x)fdx + f (max{0,n(x)} - p(x)fdx


J-M J\x\>M
oo

/
(max{0,p(x)} - p(x))2dx + 2 / (max{0,p(;c)} - p{x))dx + 2h
-oo 3_ J-M
oo
2M
/ -oo(p(x) - p(x))2dx +

Taking, for example, = 3/2, where is an arbitrary positive number,
we arrive at the following assertion.

COROLLARY 3.6.2. For (\ 3/2\ we have

MISE(pn(x)) < MISE(p(*)) +


Theorems 3.6.1 and 3.6.2 show that there are no reasons to avoid the use
of estimators producing estimates which are not densities. This is especially
important with respect to kernel estimators based on the so-called sine and
superkernels. The sine kernel is the function

() =
nx

with the Fourier transform

* - { [0,
!">
|*| , s l1;

defined as the Cauchy principal value of the corresponding integral:


r<x
no
iteSinx
(pit) = v.p. / e dx.
J c ) JIX

The superkernels are defined as absolutely integrable functions K(x) with


poo


J c
K(x)dx = 1,

whose Fourier transforms are equal to one on the interval [1,1]. An example
of a superkernel is
. sinxsin(2x)
K(x) =
22
220 3. Empirical characteristic functions

whose Fourier transform is

1, |*|;si,
(pit) = |(3 - |f|), 1 < |i| < 3,
0, |x| > 3.

In the remainder of this section, we concentrate on the kernel estimator


with the sine kernel, which usually gives better approximation than other
kernel estimators and can be studied much easier. Its defectspossible neg-
ativeness and non-integrabilitycan be easily corrected by the modification
procedure consisting of transition to the estimator pn(x) given by (3.6.3). After
this transition, approximating properties of the estimator even improve due
to Theorem 3.6.1.
Thus, from now on we use the notation pn (x) for the kernel estimator with
the sine kernel (shortly, sine estimator). In terms of the empirical characteris-
tic function, it is of the form

(3.6.26)

where f n (t) is the empirical characteristic function. Suppose that the charac-
teristic function f(t) of the underlying density p(x) is integrable, and denote

First, we obtain relations for the sine estimator, similar to those given by
Lemmas 3.5.1 and 3.5.2 (we cannot apply Lemmas 3.5.1 and 3.5.2 since now
K(x) is not integrable).

LEMMA 3.6.3. For the estimatorpn(x) defined by (3.6.26), we have

supMSE(p n (x)) <


and

(3.6.28)
3.6. Non-parametric density estimation II 221

PROOF. Let us prove the first inequality. We have

1 ( r rVhn ^
MSE(p(*)) = -- / e~Uxf{t)dt / e~Uxfn(t)dt
2 yj-oo J-vhn j
1 fVhn
= e~Uxf(t) dt + e~ltx(f(t) - fn(t)) dt
2 n L\t\>Vhn 2 J-Vhn
2
rvh"
= e x f(t) dt + J e-ltx(f(t)-fn(t))dt
2~J\t\>vh
i n -1 Ihn

Let us estimate the second term on the right-hand side. Denote it by T%.
Taking into account that

*fn(u)fn(v) = -f(u + v) + (1 - - ) f(u)f(v),


\ 1

we obtain
1 1 ryhn ryhn ,
2 = - 77 / / e~Mx[f(u + v)- f(u)f(v)) du dv
(27 J l/hn J-Vhn

1 fyh" 1 rvh- rvh*


= 7T- TT" / e-l(u+v)xf(u + v)du dv e~ltxf(t)dt
Inn J-vhn 2 J-vhn [2 J-vhn
1 rVhn 1 fVhn+V
1 1 fUhn
dv / e~lixf(t)dt
= 5 / b r / 2 J-vh

2 [ 2 J-vhn+v

Therefore, 1 /1/ 1 /-00 1 1 foo 1 rVhn


T2< / dv \f{t)\dt+- \m\dt / ds
2 2 J-<> 2 J-<x> 2 J-
2 1
= / \m\dt.
% J00

We thus arrive at (3.6.27).


Let us prove (3.6.28). We may use relation (3.5.9) with

[/(*), \t\<l/hn,
Ux) =
lo otherwise;

therefore,

1 ryhn f
MISE(p n (z)) = / E\fn(t) - m\2dt + / \m\2dt
2 \ J-Vhn J\t\>Vhn
222 3. Empirical characteristic functions

and it suffices to show that


1/
rlIK
/
-vhn E\fn(t)-f(t)\ dt<.
2
hnn
Taking into account that

E\fn(t)\2 = - + U
\
- ~ J) m \ \

we obtain
Utln 1 rUtln
2
/ -VhnE\fn(t) - m\ dt = - /J-yhn (i - \m2\)dt
1 rvh" 2
<- / dt = -.
J-vhn "-Tin

Now we obtain some estimates for MISE and MSE of the sine estimator
depending on the smoothness of the underlying density. First we consider
the non-smooth case where the density to be estimated is not supposed to be
differentiable and even continuous.

Let p(x) have bounded variation: V(p) = V < oo, and pn(x) be
THEOREM 3 . 6 . 3 .
the sine estimator. I f h n = hol^/n, then

MISE(p n (z)) < (v2h0 +-M .


\ hQJ

PROOF. Making use of relation (3.6.28) of Lemma 3.6.3 and Theorem 2.5.3, we
obtain

MISE(p0c)) < [ \m\2dt+^~


2 J\t\>vhn nhn
00
1 Lt,2 f dt
dt + 2
nhn_

^/ V ho

COROLLARY 3 . 6 . 3 . Let the conditions of Theorem 3.6.3 be satisfied. Then for
each 1 n,
infMISE(p(x))< 2 V
h>0 '
3.6. Non-parametric density estimation II 223

COROLLARY 3.6.4. Letp(x)be a unimodal density function and pn (.) be the sine
estimator. Ifp(x) is bounded:

suppGc) < a < oo,


X

and hn = hg/\fn, then

MISE(p><*))
B 0e)) < (f 420 + ^
Uy/n \ .

Now consider the case where the density to be estimated is m times dif-
ferentiable, m > 1. It will be shown that in this case the upper bound for the
MISE of the sine kernel is of the order n - 2 m / ( 2 m + 1 ) which in essence cannot be
achieved (for m > 2) if kernel estimators are used with kernels being density
functions.

THEOREM 3.6.4. Let p(x) be m times differentiable (m > 1), and p{m\x) be a
function of bounded variation: V ( p ( m ) ) = Vm < oo. Ifpn(x) is the sine estimator,
and
hn = h0n-y{2m+1\

then

MISE(()) < ( + n" 2 m / ( 2 m + 1 ) . (3.6.29)


2 \ 2m + 1 /

PROOF. We have
2m
f \f(t)\2dt = h2nm f ()m\m\2dt
J\t\>Vhn J\t\>Vhn \hnJ

<hlm [ \t\2m\m\2dt
J\t\>Vhn
OO

-oo \t\ \f(t)\ dt. (3.6.30)


/ 2m 2

Let us estimate the integral on the right-hand side, making use of Theo-
rem 2.5.4. We have

/-!1'

therefore
roc roo At

_ 4(m + l)(2m+l)/(m+l) (o g 31)


2m+ 1 m y '
224 3. Empirical characteristic functions

Thus, from inequality (3.6.28) of Lemma 3.6.3, and relations (3.6.30) and
(3.6.31) we obtain (3.6.29).

COROLLARY 3.6.5. Let the conditions of Theorem 3.6.4 be satisfied. Then for
each = 1, ...,n,

2m + 1 2m/(2m+l) vl/(m+l) 2m/(2m+l)


inf MISE(p(:e)) < [4(m + 1)] l/(2m+l) "m n
h> 2 m

THEOREM 3.6.5. Let p(x) be m times differentiable, andp^m\x) be a function of


bounded variation: V(p(m)) = Vm < oo. Ifpn(x) is the sine estimator, and

= hon-"1-,

then

(m + l) 2 y2m/(m+l)^2(m-1)
supMSE(p(x)) <
* m

+2I + Jiy mm/ ( m + 1 ) ) _L -2(m-l)/(2m-l)


m m J ho.

The proof of this theorem is similar to that of Theorem 3.6.4; one just needs
to use relation (3.6.27) of Lemma 3.6.3 instead of relation (3.6.28) and take into
account that, by virtue of Theorem 2.5.4,

2 J-v^m*1) 2 J\t\>v^m*1) |f |m+1

< I [yl/(m+l) + _^_ym/(m+l)l


Lm m m J

COROLLARY 3.6.6. Let the conditions of Theorem 3.6.5 be satisfied. Then for
each = 1,2,...,

2m - 1 m + 1 2/(2m1)
inf supMSE(p(x))<
>0 " m
m + y(m-l)/(/7i+l)
(2m-2)/(2m-l)
y2/(m+l)n-2(m-l)/(2m-1)
m(m 1)

Now we proceed to the 'supersmooth' case which we define in terms of


characteristic functions (although this class of distribution can be defined in
terms of density functions as well, a description in terms of characteristic
3.6. Non-parametric density estimation II 225

functions is more simple, more natural, and more convenient for our purposes).
A distribution F with characteristic function f(t) is said to be supersmooth if
for some a > 0 and > 0,
oo

/ -oo ey^a\ f{t)\dt < oo.

THEOREM 3 . 6 . 6 . Let the characteristic function f(t) ofp(x) satisfy the relation
oo
Y a
/ -oo e ^ \f(t)\dt<oo
for some a > 0 and > 0. Ifpn(x) is the sine estimator, and
-1/
hn = log(&ora)
L
then

MISE(p(x)) < (3.6.32)


2 ho

PROOF. We have
ff \f(t)\2dt< f \f(t)\dt = e~^ f e^\f(t)\dt
J\t\>vk
'\t\>l/h, n J\t\>ykn J\t\t\ZVhn
oo (;,)
/ -oo e^a\m\dt = nho
Using this bound and inequality ( 3 . 6 . 2 8 ) of Lemma 3 . 6 . 3 , we obtain ( 3 . 6 . 3 2 ) .

THEOREM 3 . 6 . 7 . Let the conditions of Theorem 3 . 6 . 6 be satisfied. Then

sup MSE(pn(jc)) < - ^ Va (logrto


+ logn)/ +
TTf 4n2nho

The proof of the theorem is similar to that of Theorem 3.6.6 (inequality


(3.6.27)is used instead of ( 3 . 6 . 2 8 ) ) .
Theorems 3.6.6 and 3.6.7 can be improved for one subclass of supersmooth
densities. The result is quite curious.

Let the characteristic function f(t) ofp(x) satisfy the condition:


THEOREM 3 . 6 . 8 .
there exists > 0 such that f(t) = 0 for || > . Ifpn(x) is the sine estimator, and

hn<~,

226 3. Empirical characteristic functions

then
2 2
supMSE(p(*)) < < =
2
X Tinhn n nhn'
and
MISE(pCx)) <

In particular, ifhn = c o n s t = l / , then

and
MISE(p(x)) < .

The validity of the theorem immediately follows from inequalities (3.6.27)


and (3.6.28) of Lemma 3.6.3: the integrals on the right-hand sides of (3.6.27)
and (3.6.28) vanish as hn < 1/.
Theorem 3.6.8 implies, in particular, that if the characteristic function of
the underlying distribution vanishes for large values of the argument, and
one uses the sine estimator for approximation, then pn(x) converges to p(x) as
-> oo even when hn does not converge to zero.
The results similar to Theorems 3.6.3-3.6.7 can be obtained for kernel
estimators based on superkernels.

3.7. Tests for independence


Beginning from this section, we deal with testing statistical hypotheses on the
basis of the use of empirical characteristic functions. In this section, we study
the problem to test independence.
Let Xj = (Xji, ...,Xjm),j = 1 n, be independent and identically distribut-
ed m-dimensional, m > 2, random vectors with common unknown distribution
function F(x), = (, ...,xm) e R m . Denote by Fk&k) the /2th marginal distri-
bution function, i.e., the common distribution function of the random variables
X\k Xnk, 1 < k < m. The characteristic functions corresponding to the dis-
tribution functions F(%), Fk(xk), 1 <k<m, a r e denoted by f(t), fk(tk), l<k <m,
respectively.
Consider the empirical distribution functions F(x), Fnk(xk), 1 < k < m,
corresponding to the distribution functions F(x), F^ixk), 1 < k < m, and their
characteristic functions (empirical characteristic functions)

1 Ath
fnkitk) = -,1'"* = / e^xdFnk(x) = /(0 0, tk,0, ...,0),
3.7. Tests for independence 227

1 <k <m.
The hypothesis of independence of components of the random vectors Xj
can be formulated in the following two equivalent forms:
m
Ho: = (*i, .,*) e R m (3.7.1)
k=\

and
m
Ho: f(h t m ) = n/*(fi). (h tm) e R m . (3.7.2)
k=l

Tests for independence can be based, respectively, on the examination of (3.7.1)


(see, e.g. (Hoeffding, 1948b; Blum et al., 1961)), or (3.7.2) (see (Csrg, 1985;
Feuerverger, 1993; Kankainen, 1995)).
We consider the tests of the second kind. Test statistics of these tests
are some measures of deviation of the multivariate empirical characteristic
function /(t) from the product of its marginals = fkih)
We present here a description of the test by (Csrg, 1985) based on the
test statistic

where t (0) is a specially chosen point of R m (the exact definition will be given
below) or the test statistic

Sn(t{n)) = Vn ( / ( t ^ ) - / ^ * 0 ) '
\ A=1 /

where t (n) is a random point which is a consistent estimator for t (0) .


The point t (0) is defined as follows. Let be an arbitrary large compact set
in R m and ri?(B ) be the space of continuous m-variate complex-valued functions
defined on with the sup metric. Consider the random function

r(t) = y/rtfn(t) - fit)).

By virtue of Theorem 3.2.4, Yn(t) converges weakly in ^(B) to a Gaussian


complex-valued random function Yf(t) having the same cross-covariance ma-
trix as y n (t), provided that condition (3.2.7) is satisfied (see Section 3.2). Con-
dition (3.2.7) is a weak tail condition on the underlying distribution which is
satisfied if
/ "m(log+ ||x||) 1+e dF(x) < oo
J/ R
R"
holds for an arbitrary small > 0.
228 3. Empirical characteristic functions

Consider the zero mean complex Gaussian process


m m
SF( t) = Y M t ) - ^ ' ( 0 , . . . ) 0 , > 0 , . . . ) 0 ) / ) .
k=l Z=1
l*k

Then (Csrg, 1985) under HQ, Sn(t) converges weakly in to Spit) if and
only if condition (3.2.7) holds.
Under i/o.
^ ( t ) = E|S F (t)| 2
m m m
2 1 2
= 1- I/aM - D - \fk(th)\ )Yl Wi)\2.
k=l k=l 1=1
l*k

By definition, t (0) is the point maximizing cr2;) on B. The idea of this choice is
that at t (0) the random function |Sf(t)|, and hence |S(t)| for large n, is most
variable.
Since the function cr^t), and hence the point t (0) , is unknown in practice,
t (0) is replaced for each by its estimate t (n) which minimizes
m m 77i
0*(t) = 1 - \fnk(tk)\2 - B 1 " /-*(*)|2)/^/)|2.
k=l k=l
l*k

an estimator of cr^t).
Under some additional conditions, the limiting distribution of the test
statistic can be obtained under HQ. Let us introduce some notation. De-
note the real and imaginary parts of /(t), / n (t), fkitk), and fnk(tk), as usual,
by the letters u and with the same indices and arguments and those of the
random function YjKt), by RF(t) and/jKt). Then (see Section 3.1),

a 11 (s, t) = ERF(s)RF(t) = i[u(s +1) + u(s - t) - 2u(s)u(t)],


12
a (s,t) = ERF(s)IF(t) = [u(e +1) - v(b - t) - 2u(s)u(t)],
^, t) = EIF(s)IF(t) = \ [u(s - t) - u(s + t) - 2u(e)u(t)].

Denote

Ak(t) = * l [ f i ( t i ) = cos Y^SLTgflitl)


=
1=1 1=1 1=1
i=
l*k \l*k / l*k

B*(t) = 3 ]}//(*/) = sin Earg/Ki/) 1/^)1'


1=1
1=1 1=1 J 1=1
3.7. Tests for independence 229

where

arg fiiti) = arctan


uiiti)

For the sake of brevity we will use the notation tk for the vector
(0,..., 0, tk, 0 0) e R m . Let S^Ct) and S%\t) be the real and imaginary p a r t s
of the random function Sf(t). We have

an(s,t) = E S ^ s ^ t )
m
= a11(s,t) - -B*(t)a 1 2 (s,i A )]
*=i
m
- J2^k(s)an(t,sk)-Bk(s)al2(t,sk)]
k=l
m m
+ Y^IAkiaiAiMJHsk,tO - Ak(s)Bi(t)a12(sk,t0
k=l 1=1
-Bk(s)Amal\tusk) + Bk{s)Bi{t)(^\sk,ti)\,
a 1 2 (s,t) = ESfp\s)Sfp\t)
m
= 1 2 (s,t) - [ B * ( t ) a u ( e , i * ) + A*(t)a 12 (s,**)]
k=l
m
- [ A * ( s ) a 1 2 ( s * , t ) -S*(e)o 22 (s Jfc ,t)]
k=l
m m
+ ^11^, ti) + Ak(s)Al(t)a12(sk, t0
k=l 1=1
- Bkismva^sk) - Bk(s)Bi(t)al2(sk,ti)],
a 2 2 (s,t) = E42)(s)S<F2)(t)
m
= ^, - J2[Bk(t)a12(tk, s)+Ak(t)a22(s,tk)]
k=l
m
- Y^[Bk(s)a12(sk, t) + A^(s)cr 22 (s^,t)]
k=l
m m
+ E[5*(s)S'(t)CTll(s*' *>+ Bk(s]Ai(.t)a12(sk, tO
k=l 1=1
+ Ak(s)Bi(t)au(ti, sk) + Akis^iW^isk, i/)].

Denote of x (t) = a n ( t , t ) , a ^ d ) = ai2(t,t), of 2 (t) = o ^ i t , t), and consider


230 3. Empirical characteristic functions

the matrices

-() =

where D(t) is the determinant of the matrix ():

D(t) = detZ(t) = ^()| 2 () - a?2(t),

and their estimators

where
DN(t) = detZ(t) = crf u (t)o| 2irt (t) - crf2 (t),

and CTii(t), CTi2,n(t), o%2tnW are obtained upon replacing u{t), v(t) and f(t)
and their marginals by un(t), u(t) and fn(t) and their marginals, respectively,
everywhere in the respective definitions of of 1 (t),CTi2(t)and of 2 (t).
We introduce the maximum variance quadratic form statistic

Mn = ti n) ))^ 1 (t (n) XS ( B 1) (t w ) > S ( B 2) (t (n) ))'

= [g(n1)(t("))a22,n(t("))]2 - 2g ( n 1) (t ( " ) )S^ ) (t ( " ) )a 12 , n (t ( " ) )

| [^(tCVn^t"1')]2
A,(t ( n ) )

THEOREM 3.7.1. Let a compact set be chosen in such a way that the point t(0)
maximizing 2;) on is unique, that is,

2^) > t)

for all t e such that t * t (0) . I f , in addition, condition (3.2.7) is satisfied,


then the distribution function ofMn converges as oo the distribution
function.

The theorem is proved in (Csrg, 1985).


Theorem 3.7.1 thus suggests the rejection of HQ if MN > 2 (a), where 2 ()
is the (1 a)-point of the distribution.
3.7. Tests for independence 231

The described test based on the test statistic Sn (t) is not consistent in the
general case. To demonstrate this, let us construct an example of a bivariate
characteristic function /(s, t) such that
/(s (0) , i (0) ) = /(s (0) , 0)/(0, i(0>),
but
f(s, t) f(s, 0)/(0, t).
Consider the characteristic function
f(s,t) = e - ' s+i '.
It is clear that
f{s, t) f{s, 0)f(0, t).
Let us demonstrate that
/(s (0) , i (0) ) = /(s (0) , 0)/(0, (0)).
We have
&*(s,t) = l - |/(s,0)|2|/(0,i)|2

- "(l - |/(s,0)|2) 1/(0, t)\2 + ( l - 1/(0,i)|2) |/(s, 0)|2

= ( l - |/(s,0)|2) ( l - | / ( 0 , i ) | 2 ) ,

hence, for any compact set, say, of the form


K= {(s,i): a <s <b, -a <t <b, a,b > 0, a <b},
we obtain s (0) = t(0) = b, i.e.,
/(s (0) ,i (0) ) = e - 2 6 = /(s (0) ,0)/(0,i (0) ).
In a particular case of bivariate independence, a consistent test was pro-
posed in (Feuerverger, 1993). The test is based on the test statistic

ill oo (1 - e- s 2 )(l - e-' 2 )


W(s,t)dsdt,

where f'n(s,t) and f'nl(s), f'^it) are the empirical characteristic function of
specially transformed data and its marginals, and W(s, t) is an appropriate
weight function.
For arbitrary m, tests based on test statistics of the form
2
r I m
k ( t ) - / * ( * * ) w(t)dt,
/Rm I k=l
where u;(t) is a weight function, were studied in (Kankainen, 1995; Kankainen
& Ushakov, 1998).
232 3. Empirical characteristic functions

3.8. Tests for symmetry


In this section, we deal with the problem of testing for symmetry of a univariate
distribution on the basis of the empirical characteristic function approach. The
starting point of this approach is the property that a distribution function is
symmetric about the origin if and only if the corresponding characteristic
function is real, i.e., its imaginary part vanishes. However, there is a trouble
in the usage of this property: a non-symmetric distribution may have the
characteristic function which is real in all points of some neighborhood of the
origin (see Example 21 of Appendix A); therefore, if a test statistic involves
the values of the empirical characteristic function only from some bounded
interval, then the corresponding test is not sufficient in the general case. This
is the case for two tests considered in this section. The trouble disappears if
the underlying distribution satisfies some regularity conditions, in particular,
if the characteristic function of the underlying distribution is analytic.
Indeed, let fit) be an analytic characteristic function, real-valued in a
neighborhood of the origin. Then is also an analytic characteristic func-
tion (see Section 1.7), and fit) and coincide in some neighborhood of the
origin. By virtue of the uniqueness theorem for analytic functions, f(t) = 9l/(i)
for all real t, i.e., f(t) is real.
Hereafter in this section (except as otherwise stated) we will assume that
this condition (the analyticity of the characteristic function of the underlying
distribution) is satisfied. This condition is equivalent (Theorem 1.7.1) to the
relation

(3.8.1)

for some h > 0 (F(x) is the distribution function). Of course, (3.8.1) is quite
a restrictive condition. However, as pointed out in (Feuerverger & Mureika,
1977), 'the restriction is less troublesome than might first be thought': we can
reduce the general case to the considered one by some transformation of the
data. Indeed, let X be a random variable with distribution function F(x), and
H(x) be any absolutely continuous, strictly increasing distribution function
which is symmetric about the origin. Then X is symmetric about the origin if
and only if 2H(X) 1 is. This means that testing symmetry of a distribution
function F(x) is equivalent to testing symmetry of the distribution function

fo, x<-l,
Fh(X) = F(H~H(X + l)/2)), \x\ < 1,
*>i,
which, obviously, satisfies (3.8.1). The sample values X\, ...,Xn should be
replaced b y 2H(X0 - 1,..., 2H(Xn) - 1.
Let X\,...,Xn be a random sample from a distribution function F(x) (which
is supposed to satisfy (3.8.1)) with characteristic function f(t). Denote the
3.8. Tests for symmetry 233

empirical distribution function and the empirical characteristic function of


X\, ...,Xn, as usual, by Fn(x) and fn(t). The first test we consider in this section
is based on the test statistic
OO
(t)] dG(t)
/ -oo [3/
where G(x) is a distribution function symmetric about the origin. Let g(t) be the
characteristic function of G(x). Then, making use of the elementary formula
sin siny = ^(cos(X y) COSOK + y)), we obtain
2
1 n
Vn = sin(fX)J dG(t)
J
j=
roo n n
= -2 / sin(txk) dCKt)
n J
- j= k=l
poo
[C0S{t{X
= J J - Xk)) - COsitiXJ + dG{t)
j=l k=l

= L t ^ i - - s(Xj+m
>1 k=l
Denote
OO

/ -oo[3 mfdGd),
oo poo
/
/ -OO JOO3/()3/( 2 )[29(/( - t2) - f(h +12))
- 4 3 ) 3 / ( 2 ) ] dG(h)dG(t2).

Ifa2> 0, then y/n(Tn ) is asymptotically normal with zero


THEOREM 3 . 8 . 1 .
mean and variance a2. If = 0, then the asymptotic distribution is degenerate
at zero.

PROOF. W e s e t

un = Q ^ - xk) - s(Xj +Xk)i

where the summation is over all Q) combinations of 2 integers chosen from


(1, ...,n). Then

MTn -T)= !-= - g(2Xj)]


J +^ ( X J n - 2T) - 7=T.
2ny/n 2y/n y/n
234 3. Empirical characteristic functions

The first and third terms on the right converge to zero. Consider the middle
term. Since Un is a Hoeffding [/-statistic, it is, by the Hoeffding theorem (see,
e.g. (Frazer, 1957, Theorem 5.1)), asymptotically normal with mean

E|(Xi - X2) - g{Xx + X2)] = 2

and variance
- Var [E X2 (g(Xi " - gfX 1 + X2))] =

(where denotes the conditional expectation of Y given X). This implies
the assertion of the theorem.

Denote

Wn(t) = ^Y/sm(tXj).
j=1
The covariance function of the random process Wn(t) is

K(s,t) = hf(ti - t2) - f(h + t2)}. (3.8.2)



As concerns the distribution of Tn under the null hypothesis, the following
assertion is true.

THEOREM 3.8.2. Let F{x) be symmetric about the origin, and let W(t) be a zero
mean Gaussian process having the same covariance function (3.8.2) as Wn(t).
Denote oo

/ -00 W2(t)dG(t);
then
m D

nTn > as oo.

PROOF. It is known (see, e.g. (Billingsley, 1968, Theorem 4.2)) that if


Xun as > oo

for each u,
Xu > X as u > oo,
and

lim lim sup P(|Zure - Y| > ) = 0, (3.8.3)


n>00 u00
then
Y > X as > oo.
3.8. Tests for symmetry 235

We set
u roo
/
W2(t)dG(t), X= / W\t)dG(t) = i,
-u Joo
u poo
/ -u W2(t)dG(t), Yn= Joo W2(t)dG(t) = nTn.
Using the Markov inequality, we obtain

-iL
P(\Xun - YnI > ) = I I w2n(t)dG(t) >
l\t\>U )

< \ f [ WlihWlWdGitJdGih),
J\u\>u J\u\>u
!\tl\>U J\t2\>u

which implies (3.8.3), and similarly,

P(\XU -X\ > ) < 4 [ [ wHhWHtildGitddGi),


J\ti\>U J\t2\>U

which implies Xu X, u oo.

The distribution of can be found if G(x) satisfies some additional condi-


tions. Assume that G(x) is absolutely continuous with density function q(x)
such that q(x) = 0 for > , 0 < < oo, and q(x) is continuous on the
interval [M,M],

THEOREM 3.8.3. The characteristic function of the random variable


pM
= / W2(t)q(t)dt
J-M

is given by
oo
) = - 2 i X j t r
j=
where {,} is the solution set of the eigen-value equation

rM
M
/ -M Vj(s)K(s,t)(q(s)q(t))y2ds.

PROOF. According to the Karhunen-Loeve theorem (see, e.g. (Ash, 1965, Ap-
pendix)),
oo
W(t) = YtZjWj(t), \t\<M,
>1
236 3. Empirical characteristic functions

the convergence being in mean square and uniform in t, and


rM
Zj= / W(t)yrj(t)dt
J
being independent normal variables having zero means and variances Xj. The
functions y/j are taken orthonormal, and Xj < oo. Hence
oo
->1 3
is distributed as a sum of independent Xjx\ variables.

The distribution of a weighted sum of independent chi-squared variates


can be found by numerical integration of the characteristic function with the
aid of the fast Fourier transform. These may be computed directly from =
f**MWHt)dG(t):

= \ [l-f(2t)]dG(t)
2 J-M
M pM
/
/ COv( W2(s), W2(t)) dG(s) dG(t)
M rM
-mJ-M

/
/ K\s,t)dG{s)dG{t).
f(t) may be estimated by -M J-M
1 n
Unit) = -^VcOSitXj).
~
j=
When the underlying distribution is symmetric, un(,t) will become uniformly
close to fit). This leads to a test procedure which, to within the accuracy of
the 2 approximation, will have asymptotic level a. More details and further
possibilities are contained in (Feuerverger & Mureika, 1977).
The second test we consider in this section, is based on a special measure of
asymmetry and can be used for testing symmetry about an unspecified centre.
First we introduce the characteristic symmetry function based on the char-
acteristic function of the underlying distribution whose behavior is indicative
of symmetry or its absence. Let X be a random variable with distribution
function Fix) and characteristic function fit). Denote the real and imaginary
parts of the latter by uit) and v(t) respectively. Let ro be the first positive zero
of uit), 0 < ro < oo (recall that we write ro = oo if uit) > 0 for all real f). Define
the characteristic symmetry function of X (or of Fix)) as
1 uit)
6it) = - arctan , 0 < t < ro.
t uit)
3.8. Tests for symmetry 237

If F(x) is symmetric about some , then

u(t) = cos(i0)E cos(t(X - )),


v(t)= sin(i6)E cos(t(X - )),

and, since || < /2 if t < ro, () = . The symmetry hence implies that ()
is a constant. Conversely, let 6{t) be equal to a constant for 0 < t < ro- Then
tan(i) = >(t)/u(t) and hence, sin(i0)u(i) cos()() = 0 (|i| < ro). Besides,
sin(i0)u(i) cos(t9)v(t) = 3 ( e ~ l t 9 f ( t ) ) . Therefore, the characteristic function
fe(t) = e~lWf(t) of X is real on the interval (,). Under condition
(3.8.1), which is supposed to be satisfied, this implies that feit) is real for all
real t.
Thus, under condition (3.8.1), the constancy of Q(t) is indicative of symme-
try. Further, a straightforward calculation yields, as t > 0,

t2 t4
Bit) = - ECX" - )3 + [EC* - 0) 5 - 10(E(X - )3( - )2)] + 0(t6),
3! 5!
with
EX" = = lim 0(t).
->
Thus, as t increases from zero, Q{t) either remains constant or, in the case
of asymmetry, departs from a constant value with direction and magnitude
initially determined by the odd central moments of X.
Now letXi, ...,Xn be a random sample from the distribution function F(x).
Define the empirical characteristic symmetry function as

en(t) = i arctan t e [a, 6],


t un(t)

where un(t) and vn(t) are the real and imaginary parts of the empirical char-
acteristic function fn(t), and [a,] is some appropriately chosen interval with
0 < a < b < ro- The idea of the proposed test consists in rejecting the hypothesis
of symmetry if () departs sufficiently from a constant value.
Under weak tail condition (3.2.7), () almost surely converges to Q{t) uni-
formly on each compact set (Csrgo & Heathcote, 1982, Theorem 2). Deviations
of Qn{t) from Q(t) are measured by the process

() = \fn(Qn{t) 0(f)), te[a,b].

This process converges weakly (Csrgo & Heathcote, 1982) to a zero mean
Gaussian process () with covariance
238 3. Empirical characteristic functions

where

h(s,t) = u(s t)[u(s)u(t) + v(s)v(t)] + u(s + i)[u(s)i>(i) u(s)u(t)]


+ u(s - t)lud)v(s) - u(s)v(t)] - v(s + t)[u(s)v(t) + u(t)v(s)]. (3.8.5)

A sufficient condition for this convergence is the weak tail condition that, for
some > 0,
E(log+ \X\)1+S < oo,
which is, obviously, satisfied if (3.8.1) holds.
The test statistic of the proposed test uses the variance function of the
stochastic process (), namely cfiit) = (, t). Before introducing it, we make
the following remark. A sensitive measure of the deviation of () from a
constant is
= \/ sup |0(s) - ()|.
a<s,t<b

Under symmetry,
= sup |r(s) - T(f)|.
a<s,t<b

converges in distribution to the random variable

= s u p |T(S) - ( ) | = I s u p ( s ) i n f T(S)|.
a<s,t<b a<s<b a<s<b

Unfortunately, nothing is known about the distribution of even if the distri-


bution of X is known. Therefore, a test statistic is suggested which, on the
one hand, is close (in some sense) to and, on the other hand, whose limit-
ing distribution can be obtained (detailed motivation of the choice of the test
statistic is contained in (Csrg & Heathcote, 1987)). Let so and be the
points, respectively, which maximize and minimize the variance function 2 (t)
over t e [a, 6]. Let a(s, t) be the estimator of the covariance function a(s, t)
of r(t) obtained by replacing u(t) and v(t) in (3.8.4) and (3.8.5) with their em-
pirical counterparts un(t) and vn(t). Set of (t) = on(t, t). Consider the following
estimators of the points so and :

= min s e [, &]: of (s) = sup of (i) > ,


I a<t<b J

tn = max Ite [a,b]: of (i) = inf o f ( s ) l .


^ a<s<b J

If so a n d a r e u n i q u e , i.e., o ^ s o ) > cP'it) f o r all e [a,b\,t * So, a n d ^ ) <


^) for all e [a, b], t (which seems to be the normal situation, and
anyway this can be achieved by the appropriate choice of the working interval
[a, &]), then, due to the results of Section 3.2, sn > so and tn to almost surely
as oo.
3.8. Tests for symmetry 239

The test statistic of the proposed test is

= IQniSn) - 3.M
B Vn + 2() - 2an(sn, i)F2

Under symmetry, the quantity

Vn|0(s) - ()I = |T(S) - rn(tn)I

converges in distribution to |T(SO) ()|> t E [A, b]. The latter is the modulus of
a normally distributed random variable with zero mean and variance cx^so) +
cP{to) 2a(so, ) Thus, if the condition of the uniqueness of so and and
( 3 . 8 . 1 ) are satisfied, then we have the following result concerning the limiting
distribution of the test statistic.

If the underlying distribution is symmetric about some center,


THEOREM 3 . 8 . 4 .
then for any > 0,
lim P(Sn <x) = 2() - 1
oo

with (;) being the standard normal distribution function. If, on the contrary,
0(SQ) * E(t0), then
SJI oo as oo.

In the remainder of this section we do not assume that condition ( 3 . 8 . 1 )


is satisfied. We present an approach which gives a possibility to reduce the
problem of testing symmetry to the problem of testing independence which
was considered in Section 3.7. This approach is based on the following lemma.
Denote the median of a random variable X by m(X).

Let X be a random variable such that P ( X = m(X)) = 0 . If


LEMMA 3 . 8 . 1 .
the distribution of X is symmetric about some (which, naturally, must be
equal to m(X)), then the random variables \X m(X)| and sgn(X m(X)) are
independent. Conversely, if the random variables |XmQO| and sgaiXm{X))
are independent, then the distribution ofX is symmetric about m(X).

The converse statement cannot be formulated as follows: if the random


variables \X | and sgn(X ) are independent for some , then the dis-
tribution of X is symmetric about . Below, after the proof of the lemma, we
present an example demonstrating that may exist not equal to the median
for which |X | and sgn(X ) are independent. But, of course, in this case
the distribution of X is not symmetric.

PROOF. Assume that distribution of-X" is symmetric about m(X). Then, for any
240 3. Empirical characteristic functions

> 0, taking into account that PCX" = m(X)) = 0, we obtain

P(|X - m(X) I < X, sgn(X - m(X)) = 1)


= P(\X- m(X) I sgn(Z - m(X)) < x, sgn(X - m(X)) = 1)
= P(X - m(X) < X, sgn(X - m(X)) = 1)
= P(0 <X - m(X) <x) = P(\X - m(X)| < x)
= P(\X - mix)I < a;)P(sgn(X - m(X)) = 1),

and similarly,

P(pr - mix) I < x, sgn(X - m(X)) = - 1 )


= P(\X - m(X)I < x)P(sgn(X - m(X)) = - 1 ) ,

i.e., I-X" m(X)\ and sgnQT m(X)) are independent.


Conversely, let \X m(X)\ and sgn(X m(X)) be independent. Let A be
an arbitrary Borel subset of the positive half-line [0, oo). We have (due to the
independence)

P(|X - m(X)I e A, sga(X - m(X)) = 1)

= P(|X - m(X)I e A)P(sgn(Z - m(X)) = 1) = P(\X - m(X)\ e A).

On the other hand,

Pipe - m(X)| e A, sgn(X - m(X)) = 1)

= P(X - m(X) e A, sgn(X - m(X)) - 1) = P(X - m(X) e A).

Hence,

P(X - m(X) e A) = |P(|X - m(X)\ s A)


= |[P(Z - m(X) e A) + P(X - m(X) e - A ) ]

which yields
P(X - m(X) e A) = P(Z - m(X) e - A ) ,

i.e., the distribution o f X m{X) is symmetric.


Let us demonstrate that the independence of \X | and sgn(X) may take
place in the case * m(X) but then, naturally, F(x) is not symmetric. Let G(x)
be a continuous distribution function such that G(0) = 0. Set H(x) = 1 G(x)
and define the distribution function F(x) as

F(x)=pG(x) + a-p)H(x)
3.8. Tests for symmetry 241

where 0 < < 1 and 1/2. The non-symmetry of F(x) is obvious. Let us show
that if X has distribution function F(x), then |X| and sgnX are independent.
Indeed, for any > 0,

= pG(x) + (1 -p)H(x) -pG(-x) - (1 -p)H(-x) = G(x),

and obviously,
P(sgnX = 1) = p.

On the other hand,

P(|X| < x, sgnX = 1) = P(0 <X <x)


= pG(x) + (1 - p)H(x) - p G ( 0 ) - ( 1 - p)H{0) = pG(x),

i.e.,
P(|X| < x, sgnX = 1) = P(|X| < * ) P ( s g n X = 1).

The equality

P ( | X | < X, sgnX = - 1 ) = P(\X\ < x)P(sgnX = -1)

can be validated similarly.


The condition PC3T = m(X)) = 0 cannot be omitted: without this condition
the lemma does not hold. Indeed, consider a random variable X taking values
1, 0, 1 with probabilities 1/4, 1/2, and 1/4 respectively. Then X is distributed
symmetrically about the origin, but

P(\X\ = 0, sgnX = l ) = 0 * i - i = P(\x\ = 0)P(sgnX = 1).

In actual practice, the condition P(X = m(X)) = 0 (which is satisfied when


the distribution o f X is continuous) is not restrictive at all. The general case can
be reduced to this one by the following 'randomization' of data. hetXi, ...,Xn
be a random sample from the distribution F(x) which is not necessarily contin-
uous. Let Y\,..., Yn be independent and identically distributed random vari-
ables with a symmetric continuous distribution function G(x) such t h a t the
corresponding characteristic function has at most denumerable set of roots
(for example, G(x) m a y b e normal or uniform), and not depending on X\, ...,Xn
(they can be generated by simulation, for example). Then the distribution
F * G is symmetric if and only if F is symmetric, therefore the initial problem
of testing the symmetry of F from the data X\,...,Xn can be replaced by the
problem of testing the symmetry of F * G from the data X\ + , ...,Xn + Yn. In
the latter problem, the underlying distribution F * G is continuous.
242 3. Empirical characteristic functions

3.9. Testing for normality


The normal distribution has always been the most widely used probability law
in statistical analysis. Many statistical methods are based on the assumption
of normality of the underlying distribution. For this reason, a wide class of
procedures have been proposed for testing for normality, and there is an ever
growing interest in this problem.
In this section, we study the empirical characteristic function approach to
the problem of testing for normality. This approach has a number of advan-
tages, in particular, the tests presented below are affine invariant, consistent
against each fixed non-normal alternative distribution and feasible for any di-
mension and any sample size. Two tests (or, more exactly, two classes of tests)
will be considered, one is based on the test statistic which is the maximal
deviation of the squared modulus of the studentized empirical characteristic
function from the characteristic function of the normal law in some neighbour-
hood of the origin, and another is based on the test statistic which is a weighted
integral of the squared modulus of the difference between the studentized em-
pirical characteristic function and the characteristic function of the standard
normal law. Since one of advantages of the empirical characteristic function
approach is its feasibility for any dimension, we consider the multivariate case.
Recall that bold lowercase letters denote m-dimensional non-random row-
vectors, bold capital letters denote both matrices and random vectors, the
prime denotes transpose. If = (x\, ...,xm ), and / is an m-variate function,
then f(xi,...,xm), /(x) and /(x') mean the same. In particular, the scalar
product of two vectors may be written in four forms: (x,y), (x, y'), (x',y),
(xV)
Let X i be independent identically distributed random vectors in R m ,
m > 1, with common distribution function F ( x ) and characteristic function /(t).
Denote the class of all non-degenerate ra-variate normal distributions by - j Y m .
The problem is to test the composite hypothesis

Ho: Fe ,m

against the general alternative of nonnormality: F e from Xi,..., X.


Denote the sample covariance matrix of X i , ...,X by S:

where, as usual,

is the sample mean vector. Fn(x) and fn(t) denote the empirical distribution
function and empirical characteristic function of , ...,Xn respectively.
3.9. Testing for normality 243

The first test is based on the test statistic

() = ^^(),

where

1, S is singular,
DN(R), S is non-singular,

DN(Z) = sup Ii rc-l/2


|/(S" +/M2
1/2t' _ -||t||2
teB(x)

S-V2 is the symmetric positive definite square root of the inverse S " 1 of S,
() = {t = (ii,..., tm): \tjI < , j = 1,..., m} is a cubic box centered at the origin
with sides of length 2, and is some positive parameter (in (Csrgo, 1986), it
is recommended to choose = lAHy/m).
Thus, the test consists of rejecting the hypothesis of normality for large
values of Mn{). Since () is undefined if S is singular, it is replaced in this
case by its maximum possible value 1. In practice this will always lead to the
rejection of Ho-
The second test is based on the test statistic

Tn() = nAf(),

where

n(t) = exp { - (s- 1/2 t',X n )}/(S 1/2 t / )

is the empirical characteristic function of the scaled residuals Yi,..., Y,

(P(t) is the weight function (some density function). Again, since Wn() is only
defined if S is non-singular, it is replaced by its maximum possible value 4 in
the case where S n is not invertible (and in this case Hq is rejected).
A good choice for the weight function is

(3.9.1)
244 3. Empirical characteristic functions

because in this case Wn() takes the simple form

2 r 2 2
(1 + 2r>2 exp
{ 2(1 + 2)11 j]l
J +
(1 + 22r/2'

so, since the computation of \\Yj Y^||2 and ||Y/||2 involves only S " 1 , not even
the square root S ~ y 2 of S " 1 is needed.
Before considering the problem of consistency of the tests we establish the
following auxiliary assertion.

LEMMA 3.9.1. If the modulus of an m-variate characteristic function /(t) coin-


cides with the m-variate normal characteristic function in some neighborhood
of the origin, then /(t) is the normal characteristic function.

PROOF. The function g(t) = |/(t)| 2 , which


is a characteristic function, coincides
with the normal characteristic function in the same neighborhood of the ori-
gin, therefore each of its univariate projections ge(t) - \fe(t)\2 coincides with
some univariate normal characteristic function in a (one-dimensional) neigh-
bourhood of the origin. Hence, by virtue of Theorem 1.7.7, all ge(t) are normal
characteristic functions. Since ge(t) = fe(t)fe(t), this implies, by the Cramer
theorem (Theorem 1.7.6), that all fe(t) are normal. Using Theorem 1.8.18, we
to the conclusion that /(t) is an m-variate normal characteristic function.
come

If the distribution function F(x.) has non-singular covariance matrix , then


j-1/2 - 1 / 2 as oo,
almost surely. Further, by Theorem 3.2.1, /(t) converges to /(t) almost surely
uniformly on each compact subset of K m . Therefore,
/(S-^t') /(" 1/2 1'),
l/^s-^t')!2 \(-')\\
exp {-i ( s - ^ t ' X ) } exp {-i (ir*^)}

as > oo almost surely uniformly on each compact subset of R m (here = EX).


Hence we obtain
lim ^() a=s sup |/( _ 1 / 2 ')| 2 - e"11'
teS(r)

<P{t)dt.
L - { - \ \ \ t f }
3.9. Testing for normality 245

The non-negative limits are zero if and only if Ho is true. Therefore,, both
tests are consistent against all alternatives with non-degenerate covariance
matrix (() oo, and () -1 oo as oo).
The question arises whether or not this is true for other alternatives (hav-
ing components with infinite variances). The problem is important because
the set of remaining alternatives contains stable distributions. It is easy to
obtain the positive answer in the univariate case: if m = 1, and EX2 = oo, then

- T 2

() -> sup =1e >0 as oo,


ie(r)
roo 2/2
a
4 f 11 e~* q>(t)dt > 0 as oo.
j<

If m > 2, then the problem is non-trivial, but it turns out that the answer is
still positive.

Tests based, on the test statistics {) and () are consistent


THEOREM 3 . 9 . 1 .
against every alternative.
In other words, ifF(x.) is non-normal, then

() oo, oo,

and
() oo, > oo,
for any weight function (fiit) which is positive in some neighborhood of the
origin.

The assertion of the theorem follows from the lemma below. Consider the
event (here, for the sake of brevity, we denote the determinant of a matrix by
I I instead of det)
oo oo
0={8>}
tj k=\ n~k
consisting in that S is infinitely many times non-singular, and let be its
complement.

LEMMA 3 . 9 . 2 . Almost surely,

liminfA^ 1 ) (T)>7 H+/ J o inf sup h/iS" 1 !')! 2 - e" l | t | | 2 |


n->oo S teB(T) I I

liminf Anf ( ) > 4/nc0 + 1 ^ infm inf [ |e-'( s " l t '- i l )/(S- 1 t') - e " l | t | | 2 / 2 2
(paWdt,

oo ^ R S JRm I

where inf in S is taken over all positive definite symmetric mxm matrices S.
246 3. Empirical characteristic functions

PROOF. Denote the (j,k)tti element of the matrix S" 1 7 2 by s~kV2{n). Let > 0
be arbitrary. Introduce the symmetric truncated matrix A n (K) with the (/, &)th
element
( _ J
T7-\
s / 2 ( /*-'
I"Jh
i ) | <K, '

aMn; } =
{Ksgnis^Hn)), IsJ k v Hn)\ > K,

where 1 <j,k < m. Denote


Rn(K) = {An(Ktf: t e ()}, Rn = { S " 1 / 2 t ' : t e B(r)}.

We, obviously, have Rn(K) c Rn; hence

^() = / { | S | = 0 } +/{|S|>0} sup ||/(t)| 2 - e x p { - | | S f t'll 2 }

>/ { | S | = 0 } +/ { |s|>o} sup |/ra(t)|2-exp{-||Sft'||2}


teRn(.K)

~ ^{|Sn|=0} +/ {|S n |>0} sup |/()| 2 - |/(i)P


teRn<K)

sup l/Wr-expi-HSft'll2}
teRn{K)

(we used the Minkowski triangle inequality for the sup norm). Since each
of the regions Rn(K) is a subset of the ra-dimensional ball {t: ||t|| < },
the first supremum on the right-hand side converges to zero almost surely
(Theorem 3.2.1). On the event , there exists a subsequence nj depending
upon the elementary events in such that rij > ooasj oo, and/{|g n | >0 } = 1,
j = 1,2,... Hence by an obvious consideration we obtain

liminf () > liminf sup \f(An.(K)t')\2 - e x p H I S ^1A - . U H t ' f }


n-> oo 0 oo teS(r)

almost surely, where

||S|| = max{|s/*|, 1 <j,k < m}

is the maximum norm of a m m matrix S = (sjk). The deterministic factor in


the lower bound is

L(K) = inf sup |/(At')| 2 exp{||SAt'|| 2 }


A-S teB(T)

where the infimum is taken over all pairs o f m x m symmetric matrices A and
S such t h a t S is positive definite, A = S _ 1 whenever | | S - 1 | | < K, and ||A|| =
3.9. Testing for normality 247

whenever | | S _ 1 | | > K. Since Iq^ + I ^ L i K ) is a lower bound for each > 0, we


obtain the first statement of the lemma by taking the limit as oo.
Now we prove the second statement. We have

08)

>47 {|S|=0> + /{|S|>0} [ I e - ^ ' ^ U S - ^ t ' ) - E-ILL^L2 2


cp(t)dt
JBW 1

JRn
>4Z { | S n | = 0 } + / { | s | > 0 } [ e - ^ ) f n ( t ) - e - ^ r \ ( 8 j * t ' ) | S f | A.
JRn(K)
Using now the Minkowski inequality for the L2 norm, we see t h a t t h e last
integral is no less t h a n

Rn(K)
L/(T)-/(T)|VSFT')|SF|DT
1/2

2
- ( [ e - ^ m - <p,(Sft')|Sf|dt
\JRn(K)
Again, the first square root in this lower bound converges to zero almost surely.
Hence by the reasoning used above,

lim inf (/3) > 0 + IqJAK, )


n-yoo
almost surely, where, with the infimum in A and S taken as above, L(K, ) is
defined to be equal to

st 2 2
inf inf [ e - i M f ( t ) - e-^l 'H ^ ( S f )|S":l|A.
"1 A.S J {At':
(At' teB(r)}

Obviously,

2
lim L{K, ) = inf 1 inf f l1 e - ' ^ ' ^ / i S - V ) - <pe(t)dt,
k-Hx " S Jb(x)

and letting finally > oo, we obtain the second assertion of the lemma.
PROOF OF THEOREM 3 . 9 . 1 . Let us begin with proving the first statement. As-
sume that
-y* oo as > oo.
Then, with some positive probability,

liminf <() = 0
rt 1 '
248 3. Empirical characteristic functions

and hence, by virtue of Lemma 3.9.2,

inf
s
sup l / C S - V f - "4"2 = 0
teBW

where the infimum is taken over all positive definite symmetric mxm matrices.
Since f(t) is uniformly continuous on the whole space R m , there exists a matrix
So from this class such that

l / ( -UM2
S0-=

for all t e (). Therefore, by virtue of Lemma 3.9.1, /(Sq 1 t / ), and hence /(t)
are normal characteristic functions.
The second part of the theorem can be proved similarly.

There are some theoretical results concerning limiting distributions of the


test statistics M n (r) and Tn(f3) under HQ. Their formulations and proofs can
be found in (Csrg, 1986; Baringhaus & Henze, 1988; Henze & Zirkler, 1990;
Henze & Wagner, 1997). In fact, the limiting distribution for the test statistic
() is not known. It is only known that

lim (() >y) = ( sup \Z(t)\ > y ) ,


\ubBW J

where Z(t) is a Gaussian process satisfying the conditions

Z(t) = Z(-t), EZ(t) = 0,


EZ(s)Z(t) = 4e ll ll ll H (cosh (s,t) - 1 - \ (s,t) 2 ) .
_ s 2_ t 2

There is also an upper bound for the tail of the limiting distribution of ()
(Csrg, 1986).
A similar situation occurs for the test statistic Tn{): there is a complete
qualitative description of the limiting distribution of () under HQ. Namely,
if the weight function q>(t) is given by (3.9.1), then under Ho,
oo
W)-> as n^ oo, (3.9.2)
A=1

where ^ (k = 1,2,...) are independent \ variables, and ,<52,... are the


eigen-values of the operator defined by

A1q{x.)== /[ /ig
h*(,
(x,
y)q-(y)<p(y)dy
3.9. Testing for normality 249

(q(y) e Lz(Rm,
jV), i.e., it is square integrable with respect to the Tri-
dimensional standard Gaussian measure), where

h*(x,y) = e-02Hx-yll2/2

||2
[l+y(2y 2y||x|| ||y|| +2

__^__e-rl|y||2[2+y(2y ||x|| 2 2y||y|| 2 +2

1 _
(l+2)3 2 ) m/2 2(^2)(|| + 5||2
"2/)

4 \1+22j ((||x||2-m)(||y||2-m)+2(^y)2-||x||-||y||+m))


Y =
2(1 + 2 )'

Another, even more attractive, representation for the limiting distribution


of () is of the same form (3.9.2), but , ... are the eigen-values of the
operator Ai defined by

A 2 < 7 ( x ) = / m K(x., y)q(y)(p(y) dy,


JR

where

However, an explicit form of , fe,... has not been found so far neither in
case of the operator nor in case of A2, therefore nothing is known theoreti-
cally about the quantitative behavior of the tail of the limiting distribution of
() (except some moments).
Thus, for both tests () and (), approximating computing formulas
and simulation have been used for the calculation of percentage points. Some
formulas and tables are contained in the works indicated above and also in
(Baringhaus & Henze, 1988; Baringhaus et al., 1989).
In conclusion of this section we point out the close relationship between
the test statistic T n () and some statistic involving a kernel density estimator.
This relationship sheds, in particular, more light on the effect produced by
varying the parameter .
250 3. Empirical characteristic functions

Taking the weight function (t) of the form (3.9.1) and using the notation

) = exp {-i < S - ^ t ' X > } /(S" 1 / 2 t'),

we obtain
2
+ l|t|12 dt.
- 11- " - p { 4 (' w) }

The function ) exp j j is the characteristic function of the convolution


of the empirical distribution of the scaled residuals Yi, ...,Y (defined above)
and the normal distribution with zero mean and covariance matrix ^ I m ,
where I m is the m m identity matrix. The corresponding distribution is
absolutely continuous (almost surely) with density

l|2
-Yi
=1 V
2 h2 J

where h2 = l/(2/32).
The function exp j ^ + j j|t||2| is the characteristic function of the
0/32.1
normal distribution with zero mean and covariance matrix ^ I m , i.e., with
density (2 ^ ) m / i exp | - ^ } , where

2 + l

Thus, making use of the Parseval-Plancherel theorem (Theorem 1.8.8), we


obtain
2
Wn() = (2n) ~ m/2 m /Rm L ( x ) - exp { -
NJ! dx.
22

Note that qn(x) is a multivariate nonparametric kernel density estimator


applied to Yi,..., Y n with the standard Gaussian kernel <p(t) and bandwidth h
(see, e.g. (Silverman, 1986)). So, the role of is that of a smoothing parameter:
a large value of which corresponds to a small bandwidth h entails a large
variance of qn(x). On the other hand, a small value of (corresponding to a
large value of h) will reduce the random variation of the expense of introducing
systematic error into the estimation.
3 . 1 0 . Goodness-of-fit tests 251

3.10. Goodness-of-fit tests based on empirical


characteristic functions
In this section, we deal with the application of the empirical characteristic
function approach to goodness-of-fit tests. This approach has a number of
advantages, in particular, tests for goodness-of-fit of a multivariate distribution
can be constructed in the same way as in the univariate case (in contrast to
many other methods).
Let Xi,...,X be a random sample from an m-dimensional distribution
function Fix.), e R m , and let Foix.) be a completely specified m-variate distri-
bution function. The problem is to test the hypothesis

H0: F(x) = F0(x)

against the general alternative

Hi: Fix) * Foix)

on the basis of ,..., X. Denote the characteristic functions of Fix) and Foix)
by fit) and /o(t) respectively. Then the equivalent form for Ho and is

H0: /(t) = / 0 (t)

against
Hi: /(t)*/ 0 (t).
Let, as usual, / n (t) be the empirical characteristic function of the sample
Xi, ...,X. The tests we consider in this section are based on a quadratic
measure of the difference between /(t) and /o(t) evaluated at r points. To
obtain their consistency, we let r = rin) > oo as > oo.
Denote the real and imaginary parts of a characteristic function fit) (with
or without an index) by u(t) and vit) with the same index. Let ti,..., tr e R m .
Denote

Z 0 (ti, ...,t r ) = (u0(ti),...,uo(tr),i>o(ti), ...,i>o(tr)),


Z ( t i , . . t r ) = iuiti),..(tr), vit!) u(tr)),
Z(ti,..., t r ) = ((ti),..., w(tr), vniti),..., vnitr)).

We have
1 n
Z(ti, ..., t r ) - Z(ti, ..., tr) = - . tr)
j=
where

Y/ti, ...,t r ) = (cos ( t i , X j ) - u(ti), ...,cos (t r ,Xj) - u(tr),


sin (ti,X,) - u(ti),..., sin (tr,X,) - u(tr)).
252 3. Empirical characteristic functions

Let Qo(ti, , t r ) be the covariance matrix of Y/ti,..., t r ) under H0. The matrix
contains the following elements (see Section 3.1):

C0v(C0S (t,X) ,COS (s,X)) = |[o(t + s) + Uo(t s) 2uo(t)Uo(s)],


cov(cos (t,X), sin (s,X)) = |[i>o(t + s) i>o(t - s) - 2uo(t)uo(s)],
cov(sin (t,X), sin (s,X)) = |[u0(t - s) - u 0 (t + s) - 2u0(t)uo(s)],
VarCcos (t,X)) = |[1 + u0(2t) - 2ujj(t)],
Var(sin (t,X)) = [1 - u0(2t) - 2wg(t)].

The covariance matrixQ(ti,..., t r ) of Y/ti t r ) with respect to the true distri-


bution function F(X.) contains the same elements but with UQ() and VQ() being
replaced by () and v(-).
Throughout the rest of this section the arguments of (ti, ...,t r ) of ZN, ZQ, Z,
YJ, and are omitted.
We define the test statistic as

T a = (Zn ~ Zo)Qq WQQ (Z - Z 0 )',

where ^"1/2 is the symmetric positive definite square root of the inverse ^ 1
of , and W = W(ti,...,t r ) is a diagonal weight matrix with non-negative
diagonal elements (the role of the weight matrix W is to direct the power of
the test based on T towards different frequencies).
Consider one example of a special case of T. It is a one-dimensional test
for uniformity (to apply it to other distributions, one must transform the data
to uniform on [0,1] random variables). Thus, let M = 1, FQ(X) = for 0 < < 1
is the uniform on the interval [0,1] distribution function. Then

sin t . 1 cos t
/oW= + l .

Choose tj = Inj, j = 1,..., r. Then = (I is the identity matrix), and, due to


the choice of tj, ZQ = 0. Therefore, the test statistic reduces to

r r
Tn = Wj[V2un(2nj)]2 + wr+j[V2vn(2nj)]2, (3.10.1)
>1 >1

where Wj is the jth diagonal element of W.


Let Wj = wr+j = l/(nj)2,j = 1, ...,r. Then (3.10.1) implies that
3.10. Goodness-of-fit tests 253

where

1 n
a,jn = - V%cos(2njXk),
n
k=l
1 n
bjn = v ^ s i n ( 2 njXk).
n
k=l

Assume that the number of points t\,t2,... is infinite: r = oo. Then it is easy
to see that nTn is similar to the Cramer-von Mises statistic of the form

oo 1

where
= 1 n
jn ~ ^cos(njXk).

Now consider the general case again. The asymptotic distribution of Tn


under the null hypothesis depends on r and the weight matrix W. We consider
several cases. The first group of tests are the so-called directional tests. They
are characterized by the condition that r is finite and does not depend on n.
These tests are not consistent against all alternatives but are useful when one
has some information about the alternatives. The limiting distribution of the
test statistic for the case of directional tests can be obtained quite easily. Let
r be finite and not depend on n. It follows from the multidimensional central
limit theorem that under Hq, y/N(ZN Zo) is asymptotically normally distribut-
ed with zero mean and covariance matrix Therefore, Zo)]
is asymptotically normal with zero mean and identity covariance matrix. This
implies the following theorem.

THEOREM 3.10.1. Let rbea finite integer not depending on n. Then under Ho,

D 2r
m
nT w
jX2 as
oo
n

where \ are independent 2, random variables with one degree of freedom.

If there is little knowledge about the alternatives, then consistent tests are
preferable to directional tests. Let r = r(n) oo as > oo. For a matrix A,
denote the largest and the smallest eigen-values of A by Amax(A) and ^ )
respectively.
254 3. Empirical characteristic functions

THEOREM 3 . 1 0 . 2 . Assume that there exists a constants > 0 such that ^^() >
. If
* > 0 as oo,

and
VtriW54 ) >n 0 as oo,
tr(W )
/ie/i under Ho,
nTn - tr(W) D
-. > N(0,1) as -> oo
\/2tr(W )
(7V(0,1) is a random variable having the standard normal distribution).

Theorem 3.10.2 suggests that HQ is rejected at significance level A if


nTn > tr(W) + a\Jltr(W2),
where za is the upper -percentile of the standard normal distribution. As
shown in (Fan, 1997), in the considered case the test based on Tn is more
powerful against high frequency alternatives than tests based on the empirical
distribution function. In actual practice, to implement this test one needs to
choose a value for r. Some recommendations and references concerning this
question are given in (Fan, 1997).
The next theorem gives the asymptotic distribution of Tn when r converges
to infinity faster than in Theorem 3.10.2, namely, r = n. Denote theyth element
of the vector ^^' by vj.

THEOREM 3 . 1 0 . 3 . Let r = n. Assume that there exists a positive constant


such that Elu/I4 <M,j= 1, ...,2r. If

%/tr(W4)
> 0 as > oo,
tr(W2)
and
tr(W)
0 as > oo,

then under HQ,


nTn - tr(W) D
, > N(0,1) as oo.
V2tr(W 2 )
The proofs of Theorems 3.10.2 and 3.10.3, as well as those of Theo-
rems 3.10.4 and 3.10.5 below, are contained in (Fan, 1997).
The asymptotic distribution of Tn under Ho may not always be normal.
Under some conditions, it coincides with the distribution of a weighted sum of
independent \ random variables.
3.10. Goodness-of-fit tests 255

THEOREM 3.10.4. Let wj = lim^oo wj exist, j = 1,..., 2 r. If as r>oo(orr = oo),

lim tr(W) < oo, lim tr(W 2 ) < oo, tr(W - W) 0,

and
tr[(W - W) 2 ] -> 0,

where W is the matrix with elements wj, then under HQ,

nTn > WjXi quadas > oo.


>i
The following theorem shows that under appropriate conditions the tests
based on TN are consistent for HQ against H\.

THEOREM 3.10.5. 1. Let the conditions of Theorems 3.10.2 or 3.10.3 be satisfied


and, in addition, let

^ tr[WQo 1/2 1/2] 0 oo, (3.10.2)

- ( - ) 1/2WQq 1/2QOq 1/2WQq 1/2(Z - Z 0 )' 0 -> oo, (3.10.3)



lim ( - 0 ) 1/2(Z - 0 )' > 0. (3.10.4)
-> " "

IfM\,M2,... is a sequence ofpositive numbers such that

Mn = o\ U ] , oo,
\\/tr(W )/

then under H\,

n(nTn- tr(W) \
. > Mn I 1 as oo.
V \/2tr(W ) /

2. Lei the conditions of Theorem 3.10.4 and relations (3.10.2M3.10.4) be


satisfied. IfM\,M<i,... is a sequence of positive numbers such that

Mn = o(n), oo,

then under Hi,


(nTn > Mn) 1 as > oo.
256 3. Empirical characteristic functions

3.11. Notes
Probably the empirical characteristic function first appeared in (Cramer, 1946).
Since then and until the middle of 70s, several works have been published us-
ing empirical characteristic functions for the parameter estimation and testing
hypotheses: (Heathcote, 1972; Press, 1972a; Kent, 1975; Feigin & Heathcote,
1976; Blum & Susarla, 1977; Heathcote, 1977; Thornton & Paulson, 1977).
A systematic study of the empirical characteristic function was initiated by
(Feuerverger & Mureika, 1977). After that, empirical characteristic functions
have been studied and applied quite extensively.
The asymptotic behavior of the empirical characteristic function and some
related statistics was investigated by (Kent, 1975; Feuerverger & Mureika,
1977; Csrg, 1980; Csrg, 1981a; Csrg, 1981c; Csrg, 1981d; Feuerverg-
er & McDunnough, 1981a; Feuerverger & McDunnough, 1981b; Marcus,
1981; Csrg & Totik, 1983; Keller, 1988; Kolchinskii, 1989; Feuerverger,
1990; Devroye, 1994).
Theorems 3.2.1 and 3.2.2, as well as Lemma 3.2.1, are due to (Csrg &
Totik, 1983). Before that, in (Feuerverger & Mureika, 1977) for the case m = 1
and in (Csrg, 1981c) for the case m > 1, a result similar to Theorem 3.2.1
but weaker (slower increase of Tn) was obtained. Theorem 3.2.3 is due to
(Feuerverger & Mureika, 1977). The weak convergence of the empirical char-
acteristic process in the space of continuous complex-valued functions on a
compact set was studied in (Kent, 1975; Feuerverger & Mureika, 1977; Mar-
cus, 1981; Csrg, 1981a; Csrg, 1981d) in the univariate case and in (Csrg,
1981c) in the multivariate case. Theorem 3.2.4 was obtained in (Marcus, 1981)
for the case m = 1 and in (Csrg, 1981c) for the case m > 1. Theorems 3.2.5
and 3.2.6 are due to (Keller, 1988), Theorem 3.2.7 is due to (Devroye, 1994).
The problem of the first positive zero of the real part of a characteristic
function and of an empirical characteristic function was investigated in (Welsh,
1986; Heathcote & Hsler, 1990; Brker & Hsler, 1991). Theorems 3.3.1
and 3.3.2 were obtained in (Welsh, 1986); Theorems 3.3.4-3.3.6 are due to
(Heathcote & Hsler, 1990).
The problem of parameter estimation on the basis of the empirical char-
acteristic function was studied in (Press, 1972a; Press, 1972b; Paulson et
al., 1975; Heathcote, 1977; Thornton & Paulson, 1977; Koutrouvelis, 1980b;
Csrg, 1981d; Koutrouvelis, 1981; Koutrouvelis, 1982; Welsh, 1985; Markatou
et al., 1995; Markatou & Horowitz, 1995). First applications of the empirical
characteristic function in statistical estimation were related to the estimation
of parameters of stable laws. In (Press, 1972a; Press, 1972b), several methods
of estimation were proposed which use the empirical characteristic function
and one of them, called the method of moments, was studied in detail, in the
case of stable characteristic functions. The integrated squared error estimator
was considered in a number of works. Consistency and asymptotic normality
of this estimator were investigated in (Thornton & Paulson, 1977) with the
3.11. Notes 257

specific weight functionnormal density, and independently in (Heathcote,


1977) in the general case. Theorem 3.4.2 is due to (Heathcote, 1977) (see
also (Csrg, 1981d)). The method of the scale parameter estimation present-
ed at the end of the section (in particular, Theorem 3.4.3) was proposed and
investigated in (Markatou et al., 1995; Markatou & Horowitz, 1995).
An introduction to kernel density estimation can be found, for example, in
(Silverman, 1986; Wand & Jones, 1995). The characteristic function approach
in non-parametric density estimation has been used in a number of works,
see, e.g. (Blum & Susarla, 1977; Chiu, 1991; Glad et al., 1999a; Prakasa Rao,
1983) . Upper bounds for MSE and MISE of kernel estimators, presented in
Sections 3.5 and 3.6, are based on (Glad et al., 1999a). Theorem 3.6.8 is a
special case of an estimate of (Ibragimov & Khasminskii, 1982).
Since to improve the rate of decrease of integrated mean square error for
non-parametric kernel density estimators beyond 0(n~ 4 / 5 ) one has to relax the
constraint that the density estimate be a density function, that is, be non-
negative and integrate to one, a number of methods of correction of density
estimators have been proposed, see, e.g. (Gajek, 1986; Hall & Murison, 1993;
Glad et al., 1999b). The methods proposed in (Glad et al., 1999b), which are
presented in Section 3.6, seem to be the most preferable.
Kernel estimators based on the sine kernel and superkernels were studied,
in particular, in (WatsonLeadbetter, 1963; Konakov, 1972; Davis, 1975; Davis,
1977; Ibragimov & Khasminskii, 1982; Glad et al., 1999a; Glad et al., 1999c).
Tests for independence based on empirical characteristic functions, have
been developed and studied in (De Silva & Griffiths, 1980; Csrg & Hall, 1982;
Csrg, 1983; Feuerverger, 1993; Kankainen, 1995; Kankainen & Ushakov,
1998). Section 3.7 is based on (Csrg, 1983).
The problem of testing for symmetry on the basis of the use of the empirical
characteristic function was studied in (Feuerverger & Mureika, 1977; Csrg
& Heathcote, 1982; Csrg & Heathcote, 1987). The results presented in Sec-
tion 3.8 are in the main due to these works. The statistic Tn was introduced
and Theorems 3.8.1-3.8.3 were obtained in (Feuerverger & Mureika, 1977). In
(Csrg & Heathcote, 1982; Csrg & Heathcote, 1987), the characteristic sym-
metry function was introduced and investigated, as well as the test statistic
Sn. In particular, they obtained Theorem 3.8.4.
The empirical characteristic function approach to the problem of testing
the composite hypothesis of normality has been studied in (Murota, 1981;
Murota & Takeuchi, 1981; Epps & Pulley, 1983; Hall & Welsh, 1983; Welsh,
1984; Csrg, 1986; Baringhaus & Henze, 1988; Csrg, 1989; Henze & Zirkler,
1990; Henze, 1990; Naito, 1996; Henze & Wagner, 1997). The results presented
in Section 3.9 are mainly due to (Csrg, 1986; Baringhaus & Henze, 1988;
Csrg, 1989; Henze & Wagner, 1997).
The test statistic which is a weighted integral of the squared modulus of the
difference between the empirical characteristic function of the residuals and
the normal characteristic function, was proposed in (Epps & Pulley, 1983) to
258 3. Empirical characteristic functions

test for univariate normality. The approach of Epps and Pulley was extended
to the multivariate case, developed and investigated in (Baringhaus & Henze,
1988; Csrg, 1989; Henze & Zirkler, 1990; Henze, 1990; Henze & Wagner,
1997). In (Csrg, 1986), the maximal deviation test statistic was proposed
and investigated as an extension of the test statistic of (Murota & Takeuchi,
1981). Theorem 3.9.1 is due to (Csrg, 1989).
The empirical characteristic function approach to constructing goodness-of-
fit tests was used in (Heathcote, 1972; Feigin & Heathcote, 1976; Koutrouvelis,
1980a; Koutrouvelis & Kellermeier, 1981; Fan, 1997) (the references to works
concerning testing for normality were given above separately). The results
presented in Section 3.10 are due to (Fan, 1997).
A
Examples

In this appendix, we present several examples, counterexamples and asser-


tions demonstrating the properties of characteristic functions, which, at first
sight, do not seem obvious and sometimes are even unexpected, or those
demonstrating that certain results, concerning characteristic functions, can-
not be extended (in certain sense), or those giving answers to questions which
often arise when one deals with characteristic functions. Many of examples
presented below are contained in books (Feller, 1971; Lukacs, 1970; Prokhorov
& Rozanov, 1969; Romano & Siegel, 1986; Stoyanov, 1987). The most recent
examples (except new ones) are Examples 17 and 19. They are due to (Ra-
machandran, 1997; Ramachandran, 1996b) respectively.

EXAMPLE 1. The characteristic function of a non-degenerated at the origin (not


concentrated at the origin) distribution is never determined by its real part. In
other words, for any non-degenerate characteristic function fit), there exists a
characteristic function g{t) such that 'Rf(t) = 9\g(t) but fit) git).
Let f(t) 1. If f(t) is non-symmetric about the origin, i.e., 3 f i t ) 0, then
we can set git) = 91fit) which is a characteristic function by Theorem 1.3.7.
Assume that fit) is symmetric about the origin. Let X be a random variable
whose characteristic function is fit). Denote the characteristic function of the
random variable |X| by git). Then

hence 91fit) = 9\git). At the same time, fit) git) because git) is the char-
acteristic function of a distribution concentrated on the positive half-line and
not degenerated at the origin, therefore 2>git) 0 while 3/() 0 due to the
symmetry.
The characteristic function foit) = 1 is uniquely determined by its real
part. Indeed, suppose that there exists a characteristic function git) such that
9\git) = Xfo(t) = 1

259
260 Appendix A. Examples

and
3git) 3/o(f) 0.
Then there exists to such that 2>gito) * 0, and

|(*)| = \ A + [3^(0)12 > 0

that is impossible.
An open question: is the same true for the imaginary part of the character-
istic function, i.e., is it true that the characteristic function is never determined
by its imaginary part?

EXAMPLE 2 . 'fit) is a characteristic function' does not imply '|/()| is a charac-


teristic function', and vice versa.
The simplest example of a characteristic function whose absolute value
is not a characteristic function: fit) = cos t, the characteristic function of a
random variable X taking values 1 and 1 with probabilities 1/2 each. Its
absolute value is not a characteristic function because it is infinitely difFeren-
tiable at zero but not differentiable at points t = | + , = 0,1,2,... (see
Theorems 1.5.1 and 1.5.2).
The converse is obvious. Let git) be a real positive characteristic function,
say, e _ i 2 / 2 . Then fit) = git) is not a characteristic function (/(0) = 1) while
|/(f)| =git) is.

E X A M P L E 3 . There exists a lattice distribution whose characteristic function is


non-periodic.
In fact, any characteristic function fit) of lattice distribution concentrated
on a set {a + nh,n = 0,1,2,...}, where a is irrational and h is rational, is
non-periodic. Indeed, suppose the contrary: fit) is periodic. Then there exists
t0 * 0 such that fit0) = 1. We have
oo oo
/() = Pnei(a+nh)t0 = emt0 Pneinht =
n=o n=oo
hence for some integer n, m and k, nhto = 2, ato = Ink, i.e., a = knhlm,
which contradicts the assumption that a is irrational.

E X A M P L E 4 . There exists an absolutely continuous distribution whose charac-


teristic function is not integrable.
In fact, by Theorem 1.2.6, the characteristic function of any absolutely
continuous distribution, whose density is either unbounded or not continuous,
satisfies this condition.
Concrete examples:
(a) fit) = sin tit, the characteristic function of the uniform on [ 1,1] distri-
bution;
261

(b) elt/2Jo(t/2), the characteristic function of the arc sine distribution, i.e.,
distribution with the density

0 < x < 1,
p(x) = i
[0 otherwise.

EXAMPLE 5. The characteristic function of an absolutely continuous distribu-


tion can tend to zero (as t > oo) arbitrary slowly. In other words, for any func-
tion () such that lim^oo () = 0 there exists an absolutely continuous distri-
bution whose characteristic function fit) satisfies the condition |/()|/|()| > oo
ast> oo.
Without loss of generality we may assume that (0) = 1, \()\ < 1, () * 0
for all t and cp(t) is continuous. Denote cpoit) = ^\()\. It is easy to see that

()
OO, t -> OO.
\(p(t)\
First let us prove that there exists a piecewise linear, continuous, non-negative,
strongly decreasing for t > 0 function () such that i//(0) = 1, () < 1, () > 0
as t oo, and () > (fait) for all sufficiently large t. Denote

t0 = 0, tn = s u p i i : () > -1, = 1,2,...


I n + 1J

Define the function () as follows. Let i//(0) = 1, () = 1 In, = 2,3,...,


and {) be linear on the interval [0,^1 and on each of the intervals [tn,tn+],
= 2,3,... . This function, obviously, satisfies the required conditions, in
particular, {) > (po(t) for t > 2; moreover, () converges to zero as t oo
because it decreases and takes all values 1 In, n = 1,2,...
Now, on the basis of the function (,), construct a function fit) which
satisfies the same conditions and, in addition, is convex for t > 0. We set

/(0=-, = 1,2

where the points t'x,t!^,... will be chosen later and define fit) to be linear on
each interval [ t ' n , , = 1,2,... We set t[ = 0,t' 2 = t<i and suppose that the
first k points t'v...,t'k have been chosen. Then

t'k+1 = max

so, the absolute value of the derivative of fit) on the interval it'k, t'k+l) is less
than that on the interval it'k_vt'k), i.e., fit) is convex for t > 0, and fit) > ()
it > 0). For negative t, define fit) by the equality fit) = fit). By virtue of
262 Appendix A. Examples

Theorem 1.3.9, fit) is the characteristic function of an absolutely continuous


distribution. On the other hand, by construction,

fit)
> oo, t > oo.
(pit)

E X A M P L E 6 . There exists a singular distribution whose characteristic function


f(t) satisfies the relation lim s u p ^ . . ^ |/(f)| = 1.
Let I,<22, ... be a sequence of positive numbers such that an > 2 for all n,
an > oo as > oo, and

n ( l - i - ) = 0 . (A.1)

Consider the characteristic function

fit) = (1 - 1 + -2_"
=l V an an

By (A. 1), the corresponding distribution function has no points of discontinuity.


On the other hand,
lim sup |/(i)| = 1.

|T|->OO

Another example: the characteristic function

OO
fit) = cos(i/7i!)
71=1

(Lukacs, 1970).

E X A M P L E 7 . There exists a singular distribution whose characteristic function


fit) satisfies the relation lim s u p ^ ^ ^ |/(/)| = 0.
We can set
00 t
/(I) = NCOS^7T

or
00 t

k=l z

(Lukacs, 1 9 7 0 ) . The product of these two characteristic functions is sin tit, the
characteristic function of the uniform distribution on [1,1].
263

E X A M P L E 8 . For any E [ 0 , 1 ] there exists a singular distribution whose char-


acteristic function g(t) satisfies the relation lim s u p ^ ^ ^ |g(f)| = p.
It suffices to take a characteristic function fit) satisfying the conditions
of Example 6 and a characteristic function hit) satisfying the conditions of
Example 7, and set
g(t)=pf(t) + (l-pMt).

EXAMPLE 9 . There exists the characteristic function fit) of an absolutely contin-


uous distribution and the characteristic function git) of a singular distribution
such that |/()|/|()| oo as |i| > oo.
This immediately follows from Examples 5 and 7.

E X A M P L E 1 0 . There exists a real characteristic function fit) which is negative


for all sufficiently large t.
Consider the function

1, 0 < i < 1,
<pit) = < t > 1,
k0, < 0.

By virtue of Theorem 1.3.4, the function


1 f
fit)=- (pis)<pis + t)ds,
c J-oo
where oo ^
/ -oo (p^{s)ds = 1 + 2,
ez
is a characteristic function. Let us show that fit) is negative for sufficiently
large t. For t > 1 we have

m = c- e~se-<s+t)ds

= e- < / _ 1 + i + M < 0 .
2e2 J
Another way to construct examples satisfying the required conditions is
based on the use of Theorem 1.3.14. By this theorem, the function

A2
)

and the function

412 ' 1
f(t) = - 1-
2 2
il + t ) 1 + t2 2 dt2 \l + t2J
264 Appendix A. Examples

are characteristic functions. They, obviously, satisfy the required conditions.


Finally, point out the characteristic function (Oberhettinger, 1973)

fit) = (1 - -'

which also satisfies the conditions of the example.

EXAMPLE 11. For any interval [a, 6] there exist two characteristic functions fit)
and g{t) such that fit) = git) for t e [a, b] but fit) git).
This proposition is a special case of the following.

EXAMPLE 12. For any symmetric interval [, a] there exist two characteristic
functions fit) and git) such that fit) = git) for t [a, a] and fit) * git) for
t e R 1 \ [a, a].
One can set, for example,

1 - \t\/2a, |i| < 2a,


fit) =
0 otherwise,

1 - \t\/2a, |i| < a,


git) =
1/2, |*| > a.

Both these functions are characteristic functions by Corollary 1.3.5.


The proposition admits some obvious generalizations (for wider class of
sets). A problem arises to give the description of the class 58 of all closed
symmetric sets such that there exist two characteristic functions fit) and
git) such that fit) = git) for te and fit) git) for te R 1 \ B.

EXAMPLE 13. The shape of a distribution is not determined by the modulus of


its characteristic function, even when the characteristic function is analytic and
does not have zeros.
The characteristic function is never determined by its absolute value. In
other words, for any characteristic function fit), there exists a characteristic
function (in fact, an infinite set of characteristic functions) git) such thatg()
fit) while |g(i)| |/()|. This is obvious: the shift of a distribution and reflection
about any point do not change the absolute value of the characteristic function.
On the other hand, these simple operations do not change the shape of a
distribution. However, there exist distributions of different shapes having
the same absolute value of the characteristic function. A simple method of
constructing such examples is given below.
Let (pit) and ) be two characteristic functions such that (pit) is symmetric
and does not have zeros, while y/it)elta is non-symmetric for any a. Then the
characteristic functions fit) = ^)() and git) = ()//(i) have the same
absolute value but the corresponding distributions are of different shapes, i.e.,
fit) git)elta and fit) git)elta for any real a. Indeed, suppose the contrary:
265

for some , one of these identities holds, for instance, let it be the first one.
Then
()() = <p(tM-t)eita,
and, since (pit) * 0 for all t,
V(t) = w(-t)eita
or
() 2 = w(t)e 2 ,
ita
i.e., () 2 is symmetric, which contradicts our assumption.
It is clear that functions (pit) and {) can be chosen in such a way that
both fit) and git) are analytic and do not have zeros.
The problem is closely related to the so-called phase problem in physics
(see, e.g. (Kuznetsov & Ushakov, 1986)).

EXAMPLE 14. There exist two different real characteristic functions whose ab-
solute values coincide.
The usually cited example is the following: two periodic functions fit) and
git) with periods 2 and 4, respectively, such that fit) = 1 |f| for |i| < 1, and
git) = 1 |f| for |f| < 2 . The first function is a characteristic function by
Theorem 1.3.12. The function git) is obtained from fit) as

git) = 2 f ( i ) - l ,

and therefore is a characteristic function by Theorem 1.3.13.

EXAMPLE 15. There exist characteristic functions fit), git) and hit) such that
fit)hit) = git)hit) but fit) git).
The simplest example is the following: fit) and git) are from Example 12,
and
m J l - \ t \ / a , \t\Sa,
10 otherwise.

EXAMPLE 16. There exist characteristic functions fit) and git) such that f2it) =
g2it) but fit) git).
For this case, Example 14 is valid. An open question: do there exist two
different characteristic functions fit) and git) such that fnit) = for some
odd ?

EXAMPLE 17. For any symmetric interval ia,a), there exists a characteristic
function which is infinitely differentiable and vanishes outside ia,a).
Let X\,X%,... be a sequence of independent and identically distributed
random variables with the uniform distribution on the interval [1,1], and
266 Appendix A. Examples

let ,2, ... be a sequence of positive real numbers such that ^- = ain->
= 1,2,..., and
oo
= 1.
n=l
Denote bn = 2- Let the random variable U be the sum of the absolutely
convergent random series Y ^ = \ a n X n . Then |/| < 1 almost surely, and the
characteristic function fit) of U is given by

f(t) _ JJ = (sinCM)\2
Unt bnt
n=l n=l ^ '

It is clear that f(t) is non-negative, even and continuous. Since

m< 1
b\t2'

f(t) is integrable over the real line, therefore U is absolutely continuous with
the density
1 f
Pix) = / e~Uxfit)dt.
2 J-oo
Since fit) is nonnegative and integrable, there exists a constant > 0 such
t h a t afit) is a probability density function, therefore for some > 0, () is
a characteristic function. Denote the corresponding distribution function by
Fix).
Thus, the characteristic function of Fix), which is ypix), vanishes outside
the interval [1,1]. On the other hand, the density of Fix), which is afit),
satisfies the inequality

/()< ( /
VJ=i

for any positive integer n, hence Fix) has moments of all orders, and pix) has
derivatives of all orders (Theorem 1 . 5 . 2 ) .
To obtain a characteristic function satisfying conditions of the example,
one can take
(pit) = YP Q

EXAMPLE 1 8 . There exist two different characteristic functions with equal ab-
solute values and such that the corresponding distributions have the same
moments of all orders.
267

Let , 2,... be a sequence of positive numbers such that


00

ak < (A.2)

Denote the triangle density on the interval [ b y P k i x ) , k = 1,2,..., i.e.,

[0 otherwise.

The corresponding characteristic functions are

In view of (A.2), the product 1 fkit) converges a s m - > oo. Denote


oo oo
= *' /() = ( )
A=1 A=1
It is easy to see that fit) is continuous at zero; hence it is a characteristic
function. The corresponding distribution is absolutely continuous. Denote its
density by p(x). We have
1 f
pi 0) = / /(f) df.
2 J-oo

Since fit) > 0 for all t, the function

= i ) / w

is a probability density function. Denote its characteristic function by foit).


Now consider the following four functions:

giit) = foit) + i[/oit + 4a) + foit - 4a)],

g2.it) = foit) - lifoit + 4 a ) + foit - 4a)],



<7l(*) = --^-rrr/(x)(l + cos(4ax)), (A.3)
2(0)

^ i x ) = ^ / ^ 1 - cos(4ax)). (A.4)
2(0)
The functions qiix) and qzix) are probability density functions, g\it) and g%it)
are the corresponding characteristic functions. The support of foit) is contained
268 Appendix A. Examples

in the interval [2a,2a]; therefore |()| |2(f)| From (A.3) and (A.4), one
can easily see that for each positive integer there exists c such that

|gi(*)| <cn\x\~2n, |<7(*)| <cn\x\-2n;

hence both <?(;) and qzix) possess moments of all orders. Since |()| \g2(t)\,
we see thatgi(i) = g^it) in some neighborhood of the origin, which implies that
all moments of <7(;) and q^ix) coincide.
Thus, and g2(t) satisfy the conditions of the counterexample.

EXAMPLE 19. For any 1 < a < 1, there exists a characteristic function fit)
which takes on the value a on some interval (of the real line) ofpositive length.
If 0 < a < 1, then one can take

fit) = a + (1 - a)git)

where git) is any characteristic function vanishing on an interval of positive


length, for example,

^ ^ | otherwise.

Let 1 < < 0. Consider the characteristic function git) from Example 14,
i.e., the periodic (with period 4) function such that git) = 1 |i| for |i| < 2. Let
numbers c and satisfy the conditions 1/3 < c < 1 and 0 < < 1. Consider the
characteristic function hc<pit) given by

hc,pit) = pgict) + (1 - p)g(t).

For b = m i n { 4 , 2 / c } , we have on the interval [2,6],

hCJ,it) = pi 1 - ct) + (1 - p)it - 3).

Set
1
=
1 + c'
then
j . . 1-3c
hr Dit) =
c,p 1+c
on the interval [2,6], therefore it suffices to take

- 1 ~ a

and fit) = hcpit) is equal to on the interval [2,6].


269

E X A M P L E 2 0 . For any a, there exist two characteristic functions fit) and git)
such that 3fit) = 3git) for || < a but 3fit) ^ 3git).
Consider, for example, two symmetric characteristic functions foit) and
goit) satisfying conditions of Example 12, i.e., such that foit) = goit) for || <
and foit) * goit) for |i| > a. Then the characteristic functions fit) = foit)e and
git) = go(t)elbt satisfy the conditions of this example for any b 0.

EXAMPLE 2 1 . For any a, there exists a characteristic function fit) such that
3fit) = 0 for |i| < a but 3fit) 0.
We take the characteristic functions fit) and git) from Example 20 and set

) = hm+gi-t)i

Then the characteristic function (pit) satisfies the conditions of this example.

E X A M P L E 22. There exists a non-symmetric characteristic function fit) such


that f2it) is symmetric.
In terms of random variables, this means that the sum of independent and
identically distributed random variables with non-symmetric distributions can
have distribution symmetric about zero.
Since 3 f 2 i t ) = 29l/(i)3/(i), to construct such an example we have to find a
characteristic function fit) such that /()3/() = 0, while 3 f i t ) 0. These
conditions will be satisfied, for example, if we construct a characteristic func-
tion whose real part vanishes outside of [1,1] whereas the imaginary part
vanishes on [1,1].
Consider the probability density function
1 cosx
qix) =

whose characteristic function is

git) =
0 otherwise.

For any a, the function


2
pix) = qix) + -q ix) sin(az)
a
is also a probability density function because

rix) = q2ix) sin(ax)

is integrable and
270 Appendix A. Examples

(r(x) is an odd function), so,


oo
oo
oop(x)dx = 1,
/ -oo

and pix) is non-negative: we have

because
1 cos*
<1,

and
sin(ax)
< a.

Denote the characteristic function of pix) by fit) and the Fourier transform
of r(x) by hit). Since rix) is an odd function, hit) is purely imaginary, hence

91fit) = git), 3/()=-3().

So, now it suffices to demonstrate that, under an appropriate choice of a,


hit) = 0 for |f I < 1 (obviously, hit) 0).
Suppose that a > 4. The Fourier transform of ^ h i t ) is rix), which is the
product of the Fourier transforms of ^ g i t ) and j^igit a) git + a)). Thus,
^hit) is the convolution of ^git) and - j^igit a) git + a)). This implies that
hit) = 0 for |f| < 1 because git -a)-git + a) = 0 for |i| < 3 (due to a > 4) and
git) = 0 for |*| > 1.
Recently, in (Ramachandran, 1997) it was shown that for any > 2 there
exist characteristic functions fit) (which may be chosen to be of lattice distri-
bution or absolutely continuous) which are infinitely differentiable and such
that fkit), 1 < k < 1, are all asymmetric about the origin, while fnit) is
symmetric.

EXAMPLE 23. There exist a discrete distribution and an absolutely continuous


one whose characteristic functions coincide on an interval containing the origin.
The function

otherwise
and the periodic function git), which has period 2 and coincides with fit) on
the interval [1,1] (see Example 14) are valid.
Another, more interesting, example is derived from Example 17 and Theo-
rem 1.3.11. In this case, characteristic functions are infinitely differentiable.
271

EXAMPLE 24. There exists a characteristic function which is differentiable but


the expectation of the corresponding distribution does not exist.
The characteristic function

where c is the normalizing constant, satisfies these conditions. This example,


given in (Zygmund, 1947), is very well known and often cited (see, e.g. (Lukacs,
1970)), and we omit the proof.

EXAMPLE 25. There exists a characteristic function f(t) which is infinitely dif-
ferentiable at every point except the origin but no absolute moment of positive
order of the corresponding distribution exists.
Let a random variable X have the characteristic function

where c is the normalizing constant. For any fixed t different from the ori-
gin there exists a neighborhood of t where the series on the right-hand side,
differentiated term by term, converges uniformly. Therefore, for this t, the
derivative of f(t) exists. The same is true for the derivative of any order. On
the other hand,

for any a > 0.

EXAMPLE 26. There exist characteristic functions which are not differentiable
at all points.
As an example, we take the Weierstrass function

EXAMPLE 27. There exists a characteristic function which is the characteristic


function of a sum X + Y of two random variables and is the product of charac-
teristic functions of the summands but the summands are not independent.
A classical example is the characteristic function of the Cauchy distribu-
tion. Suppose that X = Y and they have the Cauchy distribution with density
272 Appendix A. Examples

Of course, X and Y are dependent. The characteristic function of each sum-


mand is fo(t) = e - '*', and the characteristic function of the sum is

fit) = Eeit(X+Y) = Eeit(X+X) = Eei2tX = "2!'! = /2(f).

EXAMPLE 28. There exists a sequence of characteristic functions f\(x),f2ix),...


which converges at all points to a function which is not a characteristic function.
Let, for example,

^ J 1 - ^ W " n>

10 otherwise.

Then the sequence fi(x),f2(x), converges at all points to the function

m-l110 otherwise,

which, obviously, is not a characteristic function.

EXAMPLE 29. The uniform (on the whole line) convergence of a sequence of char-
acteristic functions fiit), /2W,... characteristic function fit) does not imply
that the sequence of the corresponding distributions converges in variation.
The distance in variation between two distributions F and G is defined as

v(F, G) = sup |iXA)-G(A)|


A

where sup is taken over all Borel sets A of the real line.
It is not hard to show that the convergence in variation of a sequence of
univariate probability distributions implies the uniform (on the whole line)
convergence of the corresponding characteristic functions. The present exam-
ple demonstrates that the converse is not true.
In view of Example 7, there exists a singular distribution F such that its
characteristic function fit) satisfies the condition

lim fit) = 0.
||-

Consider the sequence of distributions F\, F<i,..., where Fn = F * Nn, Nn is the


normal distribution with zero mean and variance 1 In. Denote the characteris-
tic functions of Fn by /, = 1,2,... (then fn(t) = f(t)e~t2/2n). All distributions
Fn are absolutely continuous, therefore

viFn,F) = 1-0 as > 00.


273

At the same time, for any > 0, there exist a = () and no = jiq(e) such that
|/(f)| < /2 for |i| > a, and |1 - e~a*'2n\ < for > n0. Thus, for > n0,

sup |/ () - /()| = max I sup |/(f) - /(f)|, sup \fn(t) - /(f)| I


ieR1 ||>

-a2/2n
< max , sup |/()| + sup |/(i)|
||> ||>

-a2/2n
< max , 2 sup |/()| > < ,
||> J
i.e., fn(t) > f(t) as > oo uniformly on the whole line.

EXAMPLE 30. 'f(t) is the characteristic function of a unimodal distribution'does


not imply 'f2(t) is the characteristic function of a unimodal distribution' (but
does so for |f2(t)\).
In terms of random variables, this means that the unimodality of indepen-
dent and identically distributed random variables X\ and Xi does not imply
the unimodality of their sum X\ + X2.
Let Xi have the distribution with the density

a, 0<*<1,
p(x) = b, l<x<3,
0 otherwise,

where a and b are such positive numbers that a + 2b = 1 and a > 2b, say,
a = 2/3, b = 1/6. Then p(x) is unimodal. Denote the density of X\ + X2 by q(x).
Simple calculations show that q( 1) = a 2 , <7(2) = 2ab, q(3) = 2ab + b2, which
yields q( 1) > q(2) < q{3), i.e., q(x) is not unimodal.
The unimodality of the distribution corresponding to |/(i)| 2 was proved in
(Hodges & Lehmann, 1954).

EXAMPLE 31. The factorization of a characteristic function into indecomposable


factors may not be unique.
Consider the characteristic function

m = ^ e i t k
6
k=o
which is the characteristic function of a discrete uniform distribution on the
set { 0 , 1 , 2 , 3 , 4 , 5 } , and the functions

/i(f) = i(l + e2U + e4it), f2(t) = | ( 1 + eu),


Slit) = i ( l + elt + e2it), g2it) = | ( 1 + e3it).
274 Appendix A. Examples

Then
m = h(t)f2(t)=gl(t)g2(t).
Each of the characteristic functions /2W and g2(t) corresponds to a two-point
distribution; hence they are indecomposable (see, e.g. (Lukacs, 1970)). Thus,
it only remains to show that fi(t) andgi() are also indecomposable. Suppose
thatgi() is decomposable: gi(t) - gu(t)gi2(t), where gn(f) andgi2(i) are non-
trivial factors. It is obvious thatgi(i) corresponds to a distribution, say, Gi(x),
concentrated in three points, 0, 1, 2, each with probability 1/3. However,
the discontinuity points of Gi(ac) are of the type Xj + yk where xj and y^ are
discontinuity points of the distributions corresponding to the characteristic
functions ^11 (i) and ^12(i) respectively (Lukacs, 1970).
Since G\(,x) has three discontinuity points andn(i),gi2() are non-trivial,
we conclude that

gn(t) = peitXl + (1 - p)eitx\ g12(t) = qeity> + (1 -


where 0 < < 1, 0 < <7 < 1. But^i(i) = gn(t)g\2(t) implies that and q must
satisfy the relations

pq = { 1 - p)(l -q)= p( 1 -q) + q(l - p) = 1/3,


which is impossible.
Thus, we have shown that g\(t) is indecomposable. fi(t) is also indecom-
posable because f\{t) = g\{2t).

EXAMPLE 32. There exists a characteristic function f(t) which is not infinitely
divisible, whereas its absolute value |/(i)| is.
It is obvious that if fit) is an infinitely divisible characteristic function,
then |/(f)| is also an infinitely divisible characteristic function. The present
counterexample, constructed in (Gnedenko & Kolmogorov, 1954) demonstrates
that the converse is false.
Let 0 < < b < 1. Consider the function
(1 - 6)(1 + qg-'O
m =
(1 - o)(l - be**)'
It admits the representation

hence f(t) is the characteristic function of a random variable X taking values


1,0,1,2,..., and such that

(1 - b)a (l-b)q + ab)bk


PCX" = - 1 ) = P(x = k) = A = 0,1,2,...
1 a 1 a
275

Let us demonstrate that fit) is not infinitely divisible. We have


oo
log/(i) = ^ [ ( - 1 ) * - 1 - 1 a * ( e - i r t - 1) + bkk~\eitk - 1)].
k=\

Thus, if fit) is written in its canonical form in the Levy-Khintchine represen-


tation (see, e.g. (Linnik & Ostrovskii, 1977)), then

_ ~ &* + ( - i ) V

and G(x) is a piecewise constantLfunction*with jumps of amplitude ^ j - at = k


Y= 2 + i

and ( a t = k for k = 1,2,... It is easy to see that G(x) is not


monotone, therefore fit) cannot be infinitely divisible.
Now consider the characteristic function
g(t) = |/(i)|2 = fit)W).

We have

log git) = + i-lf-lak)ie-itk - 1)


k=l
oo
+ + ( - 1 ) * - V x e " * - 1).
k=l

Thus in the Levy-Khintchine formula for git), we have = 0 and G(x) is a


piecewise constant function with jumps of amplitude the points
= k, k = 1, 2,... The function G(x) is non-decreasing; therefore git) = |/(i)|2
is infinitely divisible. This immediately implies that |/(f)| is also infinitely
divisible.

EXAMPLE 33. There exists a characteristic function fit) which coincides with
the normal characteristic function e~i2/2 on some interval but fit) e~{2/2.
Consider the function

_|l-i(l-)|t|, |i| < 2,


fit)
e- { 2 / 2 , |*| > 2.

This function is continuous, decreasing, and convex for t > 0, hence, by Theo-
rem 1.3.9, it is a characteristic function.
Of course, an interval in examples of such kind cannot contain the origin:
if a characteristic function coincides with the normal characteristic function
276 Appendix A. Examples

on an interval containing the origin, then they coincide everywhere. An open


question: is it true that for any closed interval which does not contain the
origin there exists a characteristic function fit) such that fit) coincides with
the normal characteristic function e~*2/2 on this interval but fit) e~'2/2?

EXAMPLE 34. Any mixture (linear combination) of the normal characteristic


function with infinitely divisible characteristic function of a discrete distribu-
tion is an indecomposable characteristic function. In other words, any charac-
teristic function of the form pe~*t2 + (1 p)f(t), where 0 < < 1 and fit) is
a characteristic function of a discrete infinitely divisible distribution, is inde-
composable.
We include this example because it covers some counterexamples often
cited (such as 'a distribution which is symmetric but is not the result of a
symmetrization procedure', 'a characteristic function which never vanishes
but is not infinitely divisible', etc.). In addition, it sounds quite unexpectedly.
A complete proof was presented in (Ushakov & Ushakov, 1986); we give its
basic idea. Suppose that

pe~^t2 + (1 - p)f(t) = <p(t)w(t) (A.5)

where () and () are some non-degenerate characteristic functions, (pit) and


() can be represented as

(pit) = a(pd(t) + (1 - a)<pc(t),


Vit) = wd(t) + (1 - )vc(t),

where 0 < < 1, 0 < < 1, (pdit) and ) are characteristic functions of
discrete distributions, 0) and () are characteristic functions of continuous
distributions. Then we have

){) = (1 - p)f(t), (A.6)


oc(l - )(pd(t)wc(t) + >3(1 - a)vfc(f)fl.(f) + (1 - aXl - = pe'^.
(A.7)

Denote the distribution functions corresponding to the characteristic functions


%(), %(f), Vd<t), Veit), fit), and e b y Gd(x), Gc(*), Hd(x), Hc(x), Fix), and
() respectively. We have
2
e"1 () < oo (A.8)
-oo
for some a > 0 and
r 2
/ e dF(x) = oo (A.9)
Joo
277

for all a > 0 (Kruglov, 1970). Relations (A.7) and (A.8) yield

r 2 r 2
/ e " dGd(x) < oo, / dGc(x) < oo,
J oo J oo
2 f 2
e* dtfdOc) < oo, e* dHc(x) < oo,
Joo Joo

for some > 0, while (A.6) and (A.9) imply that at least one of the relations

holds for all a > 0. This contradiction shows that eqrefeqA5 is impossible.

EXAMPLE 35. Let & = J^"(a,c) be the set of characteristic functions whose dis-
tributions are absolutely continuous, concentrated on the same interval [c,c],
and their densities are bounded by the same constant a (a > 1/c). There does not
exist a function g(t) such that limi^oogii) = 0 and |/(f)| < g(t) for all f(t) e
In other words, there exist > 0, a sequence of characteristic functions
f\(t),f<i(t),... from a,c), and a sequence t\,t2,, such that tn oo as
> oo, and
liminf |/()| >

(although fk(t) 0 as | > oo for each k).


Assume for simplicity that c = (and hence > /). Consider the sets

Let f\{t), f2(t),... be the characteristic functions of uniform distributions on the


setsi,2, Set tn = , = 1,2,... Then

EXAMPLE 36. For any a > 0, > 0, and any real there exists an absolute-
ly continuous distribution function Fix) with density p(x) and characteristic
function fit) such that supx p{x) < a and |/(T)| > 1 .
Let arbitrary positive , and real be fixed. Consider the set of the real
line
, = cosiTx) > 1 }.
278 Appendix A. Examples

This set is of infinite Lebesgue measure; therefore there exists a subset Ba of


< having the Lebesgue measure 1 la. Let F(x) be the uniform distribution
on the set Ba. Then sup^pOt) = a and

oo
/ -oo cos(Tx)p(x)dx

cos(Tx)dx > -a( 1 ) = 1 .


JBBa

This example demonstrates, in particular, that the unimodality condition


in Theorem 2.4.3 cannot be omitted.

EXAMPLE 3 7 . The function /(T) = 1 - ||t|| for ||t|| < 1 and / ( t ) = 0 for ||t|| > 1,
t e R m , is not a characteristic function if m> 1.
It is well known that in one-dimensional case the function

[1-1*1, 1*1*1,
f
\0, || > 1,

is a (univariate) characteristic function. This fact is usually used to prove the


Polya criterion (Theorem 1 . 3 . 9 ) as well as to construct various examples of
characteristic functions satisfying some given conditions. It turns out that in
multi-dimensional case the function

is not a characteristic function. For m > 3 this immediately follows from


Theorem 1 . 8 . 2 0 . For m = 2 the proof was given in (Velikoivanenko, 1 9 8 7 ) .
This example shows that there is no straightforward extension of the Polya
criterion to the multi-dimensional case.

EXAMPLE 38. There exists a bivariate characteristic function f(s, t) which is the
product of univariate marginal characteristic functions in some neighbourhood
of the origin but not everywhere on the plane. In other words, there exists a
bivariate characteristic function f(s,t) such that f(s,t) = fi(s)f2(t), |s| < , || < ,
for some > 0, where / i ( s ) and /2(i) are marginal characteristic functions:
his) = f(s, 0), f2(.t) = f(0, t), but f(s, t) h(s)f2(.t).
Consider the bivariate probability density

, 2 (1 cos*) (1 cosy) z 2 .
p(x,y) = ~ cos (x +y).
279

Its characteristic function is

'(1 - |s|)(l - |i|), |s| < 1, \t\ < 1,


|s 2| < 1, |f 2| < 1.
|s 2|)(1 |f 2|),
f(s,t)= '
i ( l - | s + 2|Xl-|i + 2|), |s + 2| < 1, \t + 2\ < 1,
0 otherwise.

It is easy to see that


f(s,t) = /(s, 0)/(0, i)
for |s| < 1, |i| < 1, but, for example,

f(2,2) = 1/2 / 0 = /(2,0)/(0,2).

EXAMPLE 39. There exists a symmetric (real) characteristic function f(t) which
has roots but the real part of the corresponding empirical characteristic function
fn(t) does not have roots with some positive probability for any = 1,2,... (
does not depend on n).
It is obvious that the real part of an empirical characteristic function may
have roots even when the real part of the underlying characteristic function
does not (this is the case for the symmetric normal distribution or, more gen-
erally, for all symmetric stable laws). This example demonstrates that the
converse is also true.
Consider the distribution F concentrated in three points 1,0,1 with prob-
abilities 1/4, 1/2 and 1/4, respectively. The corresponding characteristic func-
tion is f(t) = ^(1 + cost) and has the roots + 2nk, k = 0,1,2,... Let us
demonstrate that
(u(t) > 0) > 1/4
for any = 1,2,..., where un(t) is the real part of the empirical characteristic
function fn(t). Without loss of generality assume that is even. Let X\, ...,Xn
be a random sample from the distribution F. Consider the event

Then we have

Characteristic functions of some
distributions

In this appendix, we give formulas and graphs of some frequently used charac-
teristic functions and characteristic functions demonstrating interesting prop-
erties (as illustrations to the examples of Appendix A). Extensive tables of
characteristic functions can be found in (Oberhettinger, 1973). Almost all
distributions below are either absolutely continuous or integer-valued.
Distributions are given by the probability density function p(x) in the ab-
solutely continuous case and by the probability distribution p(k) for discrete
distributions. The characteristic function is denoted by f(t). Parameters of a
distribution are indicated after the argument and separated from the latter by
the semicolon.
Some examples are included into some others as special cases (for instance,
the arcsine distribution is a special case of the beta distribution).
On the figures, the absolute value of a characteristic function, its real and
imaginary parts are represented, respectively, by the solid, dashed and dotted
curves. In the discrete case, the characteristic function is usually represented
on the interval [0,2], in the absolutely continuous case, the length of the
interval where a characteristic function is represented is chosen so that all
essential information on the behavior of the characteristic function is available.
The letters j, k, I, m, , M, and (with or without indices) stand for integer
numbers.
The distributions are placed in the alphabetic order.

281
282 Appendix . Some characteristic functions

ARCSINE DISTRIBUTION

0 otherwise.

fit) = eit,2J0(t/2),

where

is the Bessel function.

Figure B.l. The characteristic function of the arcsine distribution


283

BESSEL DISTRIBUTION


{.)=[ {)' x >

10 otherwise,

> 0, where
OO 2 / \ 2*+p
7 =
!(++1) (f)
2
m = [i - it+Va-it) -i

Figure B.2. The ch.f. of the Bessel distribution with parameter = 2


284 Appendix . Some characteristic functions

BETA DISTRIBUTION

f - -1, 0 < < 1,


p(x;p,q)= . > 0, g > 0;
[0 otherwise,

Tip + k) j-itf
Tip) Tip + q + k) k\
k=0

Figure B.3. The ch.f. of the Beta distribution with parameters = 2, q = 1/2
285

Figure B.4. The ch.f. of the Beta distribution with parameters = 1/2, q = 2
286 Appendix . Some characteristic functions

BILATERAL EXPONENTIAL DISTRIBUTION (STANDARD LAPLACE DISTRIBUTION)

F i g u r e B.5. The ch.f. of the bilateral exponential (standard Laplace) distribution


287

BINOMIAL DISTRIBUTION

p(k;p,n)= Qp*(l-/>)"-*, = 0,1,2,..., n;

is positive integer, 0 < < 1;

f(t)=[l+p(eu -l)]n.

Figure B.6. The ch.f. of the binomial distribution with parameters = 0.3, = 10
288 Appendix . Some characteristic functions

CAUCHY DISTRIBUTION

p(x;a,b) = - , -oo < a < oo, b > 0;


[(: ar + b ]

Figure B.7. The ch.f. of the Cauchy distribution with parameters = 3, b = 1


289

CONVEX KHINTCHINE DISTRIBUTION

1 cos*

I* I s 1.
otherwise.

Figure B.8. The characteristic function of the convex Khintchine distribution


290 Appendix . Some characteristic functions

DISCRETE UNIFORM DISTRIBUTION

p(k-j I) =[1/Z' k=J'j+1' - 'J + l - 1 '

10 for other k

(j and I arbitrary integers);


exp{ii[/' + (Z - l)/2]} sin(Zi/2)
f(t) =
I sin(i/2)

Figure B.9. The ch.f. of the discrete uniform distribution with j = 0,1 = 5
291

EXPONENTIAL DISTRIBUTION

~, x>0,
(;)=< A > 0;
10 otherwise,

f(t)=
it

F i g u r e B.10. The ch.f. of the exponential distribution with parameter = 1


292 Appendix . Some characteristic functions

GAMMA DISTRIBUTION

y ' > ' . , A>0, p> 0;


0 otherwise,

The graph of a representative of gamma distributions (the exponential


distribution, = 1) was given above.

Figure B.ll. The ch.f. of the gamma distribution with parameters = 1, = 3


293

GEOMETRICAL DISTRIBUTION

p(k;p) = p(l ) , A = 1,2,..., 0<p<l;

lt
e 1+

Figure B.12. The ch.f. of the geometrical distribution with parameter = 0.4
294 Appendix . Some characteristic functions

HYPERBOLIC COSINE DISTRIBUTION

p(X) = 1
cosh x '
1
f(t) =
cosh(7rt/2)'

Figure B.13. The ch.f. of the hyperbolic cosine distribution


295

HYPEREXPONENTIAL DISTRIBUTION

. *= 0Ckhe~hx, X>0,
(,,,)= \
y0 otherwise,

a.k > 0, /f > 0, k = 1 , = ^ = 1, m is a positive integer;

m= a k h

Figure B.14. The characteristic function of the hyperexponential distribution


with parameters m = 3, = 0.2, 2 = 0.3, = 0.5, = 1, - 2,
and A3 = 3
296 Appendix . Some characteristic functions

LAPLACE DISTRIBUTION

p(x;a,b) = exp { |1, oo < a < oo, b > 0;


2b y b J

Figure B.15. The ch.f. of the Laplace distribution with parameters a = 1, b = 0.5
297

LOGARITHMIC DISTRIBUTION

p(k;p) = - ^ - { 1
u
P)k
, k = l,2,..., 0 <p < 1;
In k
m - -
lnp

Figure B.16. The ch.f. of the logarithmic distribution with = 0.2


298 Appendix . Some characteristic functions

LOGISTIC DISTRIBUTION

p(x;a,b) = -oo < a < oo, 6 > 0;


fev^l + e x p f - ^ ^ ) }

F i g u r e B.17. The ch.f. of the logistic distribution with a = 1 and 6 = 1


299

NEGATIVE BINOMIAL DISTRIBUTION

+ k
p{k-,n,p)=r k - k= 0,1,2,...,

is positive integer, 0 < < 1;

P n
m =
[1 CI -)"]"'

Figure B.18. The ch.f. of the negative binomial distribution with = 5 and = 0.7
300 Appendix . Some characteristic functions

NORMAL DISTRIBUTION

-oo < < oo, > 0;

f.
fit) = exp < > .

Figure B.19. The ch.f. the normal distribution with parameters = 3, 2 = 1


301

PEARSON DISTRIBUTION OF THE THIRD TYPE


1 e~px/a, > a,
{,,) = { <+1> eU+- ) a > 0, > 0;
otherwise,

iat iatyP-1
fit) = e
.

Figure B.20. The ch.f. of the Pearson distribution of the 3rd type with a = 3, = 1
302 Appendix . Some characteristic functions

PEARSON DISTRIBUTION OF THE SEVENTH TYPE

) (1 + 2, a > 0;
(; a) =
v^r(a-i)
/(f) =
a-1/2
1 (\t\
Ka-y2(\t\),
V 2
)

where

KV(Z) = [7_ v (z)-7 v W],
2 sin(7rv)
and
v+2
Iy(z) =
n=l n\T(v + n + 1) \ 2
is the modified Bessel function.

Figure B.21. The ch.f. of the Pearson distribution of the 7th type with a = 1.5
303

POISSON DISTRIBUTION

p(k;X) = , k = 0,1,2,..., > 0;


k\
m = {( - }.

Figure B.22. The ch.f. of the Poisson distribution with parameter = 2


304 Appendix . Some characteristic functions

STABLE DISTRIBUTION pix; a, a, y, ) is the density of a random variable^ such


thatX has the same distribution as bY + cZ + d, where Y and are inde-
pendent and distributed as X;b,c and d are constants. The conditions for
the parameters are the following: 0 < a < 2, |j3| < 1, > 0, oo < a < oo.

fit) = exp jioi - Y\t\a 1 + imit, a)

where

tan , a * 1,
(oit, a) = 2
-2 log |f|, a = 1.
71

Figure B.23. The ch.f. of the stable distribution with a = 1.2, = 1, = 1, a = 0


305

Figure B.24. The ch.f. of the stable distribution with a = 0.3, /3 = - l , y = l, a = 0


306 Appendix . Some characteristic functions

STUDENT DISTRIBUTION (-DisTRiBUTioN) WITH A DEGREES OF FREEDOM

, , (( + 1 ) / 2 ) -2V(A+1)/2

provided t h a t ( a + l)/2 is an integer.

Figure B.25. The ch.f. of the Student distribution with a = 3


307

TRIANGULAR DISTRIBUTION (SIMPSON DISTRIBUTION)

1 \a+b2x\
1 6 , a <x <b,
oo < < oo, b > a;
0 otherwise,

2 eibt/2 iatl2
lbm - e

b a it

Figure B.26. The ch.f. of the triangular distribution with a = 0, b = 2


308 Appendix . Some characteristic functions

UNIFORM (RECTANGULAR) DISTRIBUTION

jl Kb-a), a<x<b,
p(x;a,b) = < , oo < a < oo, b > a;
I0 otherwise,
_ (p* _ ei*t
it(b a)

Figure B.27. The ch.f. of the uniform (rectangular) distribution with a = 0, b = 2


309

-DISTRIBUTION (distribution of the Fisher dispersion ratio with m\ and mi


degrees of freedom)

2mTl/2m2/2r emiX
L v 2
p(x;m1,m2) = -
( ) ( ) (m2 + m i e ^) ( m i + m 2 ) / 2 '

where mi and mi are positive integers;

Figure B.28. The ch.f. of the z-distribution with = 1, mi = 7


310 Appendix . Some characteristic functions

Figure B.29. The ch.f. of the z-distribution with mi = 5,7712 = 2


311

^-DISTRIBUTION (WITH DEGREES OF FREEDOM)

f 1 jji/2-1 -x/2 > Q


p { x - , n ) = \ * n a K n l 2 ) * e > *>U>

y0 otherwise,

where is a positive integer;

f i t ) = 1

( 1 - 2 i t r 1 2 '

Figure B.30. The ch.f. of the ^-distribution with = 5


312 Appendix . Some characteristic functions

, * _r2/0
p(x) = e , oo <x <00;
\/2
/() = (1 - i 2 )e- i 2 / 2 .

This characteristic function f(t) is negative for all | | > 1 (see Example 10
of Appendix A). The next characteristic function possesses the same
property.

Figure B.31.
313

2x2
p(x) =

ffl = (1 - -.

Figure B.32.
314 Appendix . Some characteristic functions

p { x ) =

fit) = (1 + -'''.

Figure B.33.
315

p(x\ a) = exp {ax - exj , a > 0;


()

Figure B.34. a = 4
316 Appendix . Some characteristic functions

p(x) = i e - M ( l + | x | ) ;

fit) =
(1 + i 2 ) 2 '

Figure B.35.
317

p(x) = exp { - - e '};


fit) = (1 - it).

Figure B.36.
318 Appendix . Some characteristic functions

p(x; a, b) = _ exp < 2 a V b bx >, > 0, a > 0, b > 0;


y/nXy/X [ XJ
fit) = exp{2a(Vb - Vb - ii)}.

Figure B.37. a = 1, b = 3
319

P(X) 4 cosh2(rac/2)'

^ sinh t

Figure B.38.
320 Appendix . Some characteristic functions

p(*) =
2sinh(7Ec/2)'

^ ^ cosh 2 1

Figure B.39.
321

p(x;p,q) = - e-xy~\ x > , > 0, q > 0;

= ( + gWip - it)
()( + q it)

Figure B.40.p = 0.5, q = 3


322 Appendix . Some characteristic functions

Figure BAl.p = 1 ,q = 0.5


323

p(x;p)=^^(ex + e-xr2p, p> 0;


THp)
1 **
f(t) =
2)

Figure B.42. = 1
324 Appendix . Some characteristic functions

+ > 0, q > 0;
T(p)T{q)
_ Tip + it)T{q - it)
m ~ TipWiq)

Figure B.43. = 1, q = 5
325

( ) = 1 -(2)- 1 -3 > > 0;


\/2

Figure .44.
326 Appendix . Some characteristic functions

_ j 1/2, k = 0,
P(R) - \ l-cos(wfe)
l > k

/() = 1 - - | |

for || < and is periodic with period 2.


The absolute value of this characteristic function coincides with the ab-
solute value of the next characteristic function, although these functions
are different.

Figure B.45.
Figure B.46.
328 Appendix . Some characteristic functions

where

to

1*1
1*1 >2,
1*1 <1,
- { U 1
* 1*1 > 1.

The imaginary part of this characteristic function vanishes in some


neighborhood of the origin but not everywhere.

Figure B.47.
329

f(t) = pe _|i|1/2 + (1 - ) cos t,


where is the solution of the simultaneous equations (in and t)

p e ~ ^ + (1 p) cos t = 0

p e ' ^ - ^ j : + (1 - p) sin t = 0
2 t

For this characteristic function, the first positive zero of the real part
of the empirical characteristic function is not a consistent estimator of
the first positive zero of the characteristic function although both these
zeros exist and are isolated.

Figure B.48.
Bibliography

Ash, R.B. (1965). Information Theory. Wiley-Interscience, New York.


Askey, R. (1973). Radial Characteristic Functions. Tech. Rep. 1262. Math. Res.
Center, University of WisconsinMadison.
Askey, R. (1975). Some characteristic functions of unimodal distributions. J.
Math. Anal. Appl. 50, 465^69.
Bahadur, R.R. (1960). On the asymptotic efficiency of tests and estimates.
SankhyA22, 229-252.
Baringhaus, L., Danschke, R., and Henze, N. (1989). Recent and classical tests
for normality a comparative study. Commun. Statist. Simul. 18, 363-
379.
Baringhaus, L., and Henze, N. (1988). A consistent test for multivariate nor-
mality based on the empirical characteristic function. Metrika 35, 339-
348.
Bartlett, M.S. (1963). Statistical estimation of density functions. Sankhy A25,
245-254.
Benke, G., and Hendricks, W.J. (1992). Tail estimates for empirical charac-
teristic functions with applications to random arrays. In: Probability in
Banach spaces 8, Birkhuser, Boston, pp. 469-478.
Bentkus, V., and Gtze, F. (1996). Optimal rates of convergence in the CLT for
quadratic forms. Ann. Probab. 24, 466-490.
Berman, S.M. (1975). A new characterization of characteristic functions of
absolutely continuous distributions. Pacific J. Math. 58, 323-329.
Bhattacharya, R.N., and Ranga Rao, R. (1976). Normal Approximation and
Asymptotic Expansions. Wiley, New York.
Bikelis, A. (1970). Inequalities for multivariate characteristic functions. Liet.
Mat. Rink. 10, 5-12.

335
336 BIBLIOGRAPHY

Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.

Binmore, K.G., and Stratton, H.H. (1969). A note on characteristic functions.


Ann. Math. Statist. 40, 303-307.
Blum, J.R., Kiefer, J., and Rosenblatt, M. (1961). Distribution free tests of inde-
pendence based on the sample distribution function. Ann. Math. Statist.
32,485-498.

Blum, J.R., and Susarla, V. (1977). A Fourier inversion method for the estima-
tion of a density and its derivatives. J. Austral. Math. Soc. A23,166-171.

Boas, R.P. (1967). Lipschitz behavior and integrability of characteristic func-


tions. Ann. Math. Statist. 38, 32-36.

Borisov, I.S. (1995). Bounds for characteristic functions of additive functionals


of order statistics. Siberian Adv. Math. 5, 1-15.
Borovskikh, Yu.V. (1985). Estimates of characteristic functions of certain ran-
dom variables with applications to o) 2 -statistics. I. Theory Probab. Appl.
29,488-503.

Brker, H.U., and Hsler, J. (1991). On the first zero of an empirical charac-
teristic function. J. Appl. Prob. 28, 593-601.

Brown, B.M. (1972). Formulae for absolute moments. J. Australian Math. Soc.
13,104-106.

Cambanis, S., Keener, R., and Simons, G. (1983). On -symmetric multivariate


distributions. J. Multivariate Anal. 13, 213-233.
Qapar, U. (1992). Empirical characteristic functional analysis and inference
in sequence spaces. In: Probabilistic and stochastic methods in analysis,
with applications, NATO Adv. Sei. Inst. Ser. C Math. Phys. Sei., 372,
Kluwer, Dordrecht, pp. 517-534.

Chambers, R.L., and Heathcote, C.R. (1981). On the estimation of slope and
the identification of outliers in linear regression. Biometrika 68, 21-33.

Chiu, S.-T. (1991). Bandwidth selection for kernel density estimation. Ann.
Statist. 19, 1883-1905.
Chung, K.L. (1974). A Course in Probability Theory. Academic Press, London.

Cramer, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press,


Princetone.

Cramer, H. (1962). Random Variables and Probability Distributions. Cam-


bridge UnivPress, Cambridge.
BIBLIOGRAPHY 337

Csrgo, S. (1980). On the quantogram of Kendall and Kent. J. Appl. Probab.


17,440447.

Csrgo, S. (1981a). Limit behaviour of the empirical characteristic function.


Ann. Probab. 9 (1), 130-144.

Csrgo, S. (1981b). Multivariate characteristic functions and tail behaviour. Z.


Wahrscheinlichkeitstheorie und verw. Gebiete 55,197-202.

Csrgo, S. (1981c). Multivariate empirical characteristic functions. Z.


Wahrscheinlichkeitstheorie und verw. Gebiete 55, 203-229.

Csrgo, S. (1981d). The empirical characteristic process when parameters are


estimated. In: Contrib. Probab. A Collection ofPapers Dedicated to Eugene
Lukacs (Gani, J, and Rohatgi, V.K., Eds). Academic Press, New York,
pp. 215-20.

Csrgo, S. (1982). The empirical moment generating function. Coll. Math. Soc.
J. Bolyai 32: Nonparametric Statistical Inference (Gnedenko, B.V., Puri,
M.L., and Vincze, I., Eds). Elsevier, Amsterdam, pp. 139-150.

Csrgo, S. (1983). The theory of functional least squares. J. Austral. Math. Soc.
A34, 336-355.

Csrgo, S. (1984). Estimating characteristic functions under random censor-


ship. Theory Probab. Appl. 28, 615-623.

Csrgo, S. (1985). Testing for independence by the empirical characteristic


function. J. Multiv. Anal. 16, 290-299.

Csrgo, S. (1986). Testing for normality in arbitrary dimension. Ann. Statist.


14, 708-732.

Csrgo, S. (1989). Consistency of some tests for multivariate normality. Metrika


36,107-116.

Csrgo, S., and Hall, P. (1982). Estimable versions of Griffiths' measure of


association. Austral. J. Statist. 24, 296-308.

Csrgo, S., and Heathcote, C.R. (1982). Some results concerning symmetric
distributions. Bull. Austral. Math. Soc. 25, 327-335.

Csrgo, S., and Heathcote, C.R. (1987). Testing for symmetry. Biometrika 74,
177-184.

Csrgo, S., and Teugels, J.L. (1990). Empirical Laplace transform and approx-
imation of compound distributions. J. Appl. Prob. 27, 88-101.
338 BIBLIOGRAPHY

Csrg, S., and Totik, V. (1983). On how long interval is the empirical charac-
teristic function uniformly consistent? Acta Sei. Math. 45,141-149.

Cuppens, R. (1975). Decomposition of multivariate probabilities. Academic


Press, New York.

Daugavet, A.I., and Petrov, V.V. (1987). A generalization of the Esseen inequal-
ity for the concentration function. J. Soviet Math. 36, 473-476.

Davis, K.B. (1975). Mean square error properties of density estimates. Ann.
Statist. 3,1025-1030.

Davis, K.B. (1977). Mean integrated square error properties of density esti-
mates. Ann. Statist., 5, 530-535.

De Silva, B.M., and Griffiths, C.R. (1980). A test of independence for bivariate
symmetric stable distributions. Austral. J. Statist. 22, 172-177.

Devroye, L. (1992). A note on the usefulness of superkernels in density esti-


mation. Ann. Statist. 20, 2037-2056.

Devroye, L. (1994). On the non-consistency of an estimate of Chiu. Statist.


Probab. Lett. 20,183-188.

Devroye, L. and Gyrfi, L. (1985). Nonparametric Density Estimation: The L\


View. Wiley, New York.

Dharmadhikari, S., and Joag-Dev, K. (1988). Unimodality, Convexity, and Ap-


plications. Academic Press, New York.

Doob, J.L. (1953). Stochastic Processes. Wiley, New York.

van Eeden, C. (1985). Mean integrated squared error of kernel estimators


when the density and its derivatives are not necessarily continuous. Ann.
Inst. Statist. Math. A37, 461-172.

van Es, A.J. (1997). A note on the integrated squared error of a kernel density
estimator in non-smooth cases. Statist. Probab. Lett. 35, 241-250.

Epps, T.W., and Pulley, L.B. (1983). A test for normality based on the empirical
characteristic function. Biometrika 70, 723-726.

Esseen, C.-G. (1945). Fourier analysis of distribution functions. A mathemati-


cal study of the Laplace-Gaussian law. Acta Mathematica 77,1-125.

Esseen, C.-G. (1968). On the concentration function of a sum of independent


random variables. Z. Wahrscheinlichkeitstheorie und verw. Gebiete 9 , 2 9 0 -
308.
BIBLIOGRAPHY 339

Fan, Y. (1997). Goodness-of-fit tests for a multivariate distribution by the em-


pirical characteristic functions. J. Multivariate Anal. 62, 3663.

Fang, K.-T., Kotz, S., and Ng, K.W. (1989). Symmetric Multivariate and Related
Distributions. Chapman and Hall, London.

Feller, W. (1971). An Introduction to Probability Theory and its Applications,


2. Wiley, New York.

Feigin, P.D., and Heathcote, C.R. (1976). The empirical characteristic function
and the Cramer-von Mises statistic. Sankhy A38, 309-325.
Feuerverger, . (1987). On some ECF procedures for testing independence.
In: Time Series and Econometric Modeling (MacNeill, I.B., and Umphrey,
G.J., Eds). Reidel, New York, pp. 189-206.

Feuerverger, A. (1990). An efficiency result for the empirical characteristic


function in stationary time-series models. Canadian J. Statist. 18, 155-
161.
Feuerverger, A. (1993). A consistent test for bivariate dependence. Intern.
Statist. Review 61, 419-433.
Feuerverger, ., and McDunnough, P. (1981a). On the efficiency of empirical
characteristic function procedures. J. R. Statist. Soc. B43, 20-27.
Feuerverger, ., and McDunnough, P. (1981b). On some Fourier methods for
inference. J. Amer. Statist. Assoc. 76 (374), 379-387.

Feuerverger, ., and McDunnough, P. (1981c). On efficient inference in sym-


metric stable laws and processes. In: Proc. Intern. Symp. Statistics
and Related Topics (Csrg, ., Dawson, D.A., Rao, J.N.K., and Saleh,
A.K.M.E., Eds). North-Holland, Amsterdam, pp. 109-121.

Feuerverger, ., and Mureika, R.A. (1977). The empirical characteristic func-


tion and its applications. Aran. Statist. 5, 88-97.
Frazer, D.A.S. (1957). Nonparametric Methods in Statistics. Wiley, New York.

Freedman, D.A., and Diaconis, P. (1982). DeFinetti's theorem for symmetric


location families. Ann. Statist. 10,184-189.
Gajek, G. (1986). On improving density estimators which are not bona fide
functions. Ann. Statist. 14,1612-1618.

Galambos, J. (1988). Advanced Probability Theory. Marcel Dekker, New York.


Gamkrelidze, N.G. (1984). On an estimate of the closeness in variation of
distributions. Theory Probab. Appl. 28, 467-469.
340 BIBLIOGRAPHY

Gil-Pelaez, J. (1951). Note on the inversion theorem. Biometrika 38, 481^482.

Glad, I.K., Hjort, N.L., and Ushakov, N.G. (1999a). Upper Bounds for the MISE
ofKernel Density Estimators. Preprint Dept. Statistics, University of Oslo.

Glad, I.K., Hjort, N.L., and Ushakov, N.G. (1999b). Correction of Density Es-
timators which are not Densities. Preprint Dept. Statistics, University of
Oslo.

Glad, I.K., Hjort, N.L., and Ushakov, N.G. (1999c). Density Estimation Using
the Sine Kernel. Preprint Dept. Statistics, University of Oslo.

Gnedenko, B.V., and Kolmogorov, A.N. (1954). Limit Distributions for Sums of
Independent Random Variables. Addison-Wesley, Reading, MA.

Gneiting, T. (1998). On -symmetric multivariate characteristic functions. J.


Multivariate Anal. 64,131-147.

Gtze, F., Prokhorov, Yu.V., and Ulyanov, V.V. (1996). Bounds for characteris-
tic functions of polynomials in asymptotically normal random variables.
Russian Math. Surveys 51,181-204.

Hall, P., and Murison, R.D. (1993). Correcting the negativity of high-order
kernel density estimators. J. Multivariate Anal. 47,103-122.

Hall, P., and Welsh, A.H. (1983). A test for normality based on the empirical
characteristic function. Biometrika 70,485489.

Heathcote, C.R. (1972). A test of goodness offitfor symmetric random variables.


Austral. J. Statist. 14,172-181.

Heathcote, C.R. (1977). The integrated squared error estimation of parameters.


Biometrika 64, 255-264.

Heathcote, C.R. (1982). The theory of functional least squares. J. Appl. Probab.
A19, 225-239.

Heathcote, C.R., and Hsler, J. (1990). The first zero of an empirical charac-
teristic function. Stoch. Proc. Appl. 35, 347-360.

Heathcote, C.R., and Pitman, J.W. (1972). An inequality for characteristic


functions. Bull. Austral. Math. Soc. 16,1-9.

Hengartner, W., and Theodorescu, R. (1973). Concentration functions. Academ-


ic Press, London.

Henze, N. (1990). An approximation to the limit distribution of the Epps-Pulley


test statistic for normality. Metrika 37, 7-18.
BIBLIOGRAPHY 341

Henze, ., and Wagner, T. (1997). A new approach to the BHEP tests for
multivariate normality. J. Multivariate Anal. 62,1-23.

Henze, N., and Zirkler, B. (1990). A class of invariant and consistent tests for
multivariate normality. Commun. Statist. Theory Methods 19,3595-3617.

Hjort, N.L., and Glad, I.K. (1995). Nonparametric density estimation with a
parametric start. Ann. Statist. 23, 882-904.

Hodges, J.L., and Lehmann, E.L. (1954). Matching in paired comparisons. Ann.
Math. Statist. 25, 787-791.

Hoeffding, W. (1948a). A class of statistics with asymptotically normal distri-


bution. Ann. Math. Statist. 19, 293-325.

Hoeffding, W. (1948b). A nonparametric test for independence. Ann. Math.


Statist. 19, 546-557.

Hoeffding, W. (1961). On sequences of sums of independent random vectors.


In: Proc. 4th Berkeley Symp. on Math. Statist, and Probab. 2, Univ. Calif.
Press, Berkeley, pp. 213-226.

Hoeffding, W. (1963). Probability inequalities for sums of bounded random


variables. J. Amer. Statist. Assoc. 58, 13-30.

Holtsmark, J. (1919). ber die Verbreiterun von Spektrallinien. Annalen der


Physik 58, 577-630.

Hsu, PL. (1951). Absolute moments and characteristic function. J. Chinese


Math. Sei., 1, 259-280.

Hsler, J. (1989). First zeros of empirical characteristic functions and extreme


values of Gaussian processes. In: Statistical Data Analysis and Inference
(Dodge, Y., Ed.). North Holland, Amsterdam, pp. 177-182.

Ibragimov, I.A., and Khasminskii, R.Z. (1982). Estimation of distribution den-


sity belonging to a class of entire functions. Theory Probab. Appl. 27,
551-562.

Ibragimov, I.A., and Linnik, Yu.V. (1971). Independent and Stationary Se-
quences of Random Variables. Wolters-Noordhoff, Groningen.

Kagan, A.M., Linnik, Yu.V., and Rao, C.R. (1973). Characterization Problems
in Mathematical Statistics. Wiley, New York.

Kallenberg, . (1974). On extrapolation of characteristic functions. Scand.


Actuarial J. 100-102.
342 BIBLIOGRAPHY

Kankainen, A. (1995). Consistent Testing of Total Independence Based on the


Empirical Characteristic Function. Ph.D. Thesis, Jyvskyl Univ. Press,
Jyvskyl.

Kankainen, ., and Ushakov, N.G. (1998). A consistent modification of a test


for independence based on the empirical characteristic function. J. Math.
Sei. 89,1582-1589.

Kawada, T. (1981). Sample functions of Polya processes. Pacific J. Math. 97,


125-135.

Kawata, T. (1972). Fourier Analysis in Probability Theory. Academic Press,


New York.

Keller, H.-D. (1988). Large deviations of the empirical characteristic function.


Acta Sei. Math. 52, 207-214.

Kent, J.T. (1975). A weak convergence theorem for the empirical characteristic
function. J. Appl. Prob., 12, 515-523.

Khintchine, A.Ya. (1938). On unimodal distribution functions. Izv. Nauchno-


Issled. Inst. Mat. Mekh. Tomsk. Gos. Univ., 2, 1-7 (in Russian).

Kolchinskii, V.l. (1989). Limit theorems for empirical characteristic functionals


in Banach spaces. Theor. Probability and Math. Statist. 39, 83-91.

Konakov, V.D. (1972). Non-parametric estimation of density functions. Theory


Probab. Appl., 17, 361-362.

Koutrouvelis, I.A. (1980a). A goodness-of-fit test of simple hypotheses based


on the empirical characteristic function. Biometrika 67, 238-240.

Koutrouvelis, I.A. (1980b). Regression-type estimation of the parameters of


stable laws. J. American Statist. Assoc. 75, 918-928.

Koutrouvelis, I.A. (1981). An iterative procedure for the estimation of the


parameters of stable laws. Comm. Statist. BIO, 17-28.

Koutrouvelis, I.A. (1982). Estimation of location and scale in Cauchy distribu-


tions using the empirical characteristic function. Biometrika 69,205213.

Koutrouvelis, I.A., and Kellermeier, J. (1981). A goodness-of-fit test based on


the empirical characteristic function when parameters must be estimated.
J. R. Statist. Soc. B43, 173-176.

Kronmal, R., and Tarter, M. (1968). The estimation of probability densities


and cumulatives by Fourier series methods. J. American Statist. Assoc.
63,925-952.
BIBLIOGRAPHY 343

Kruglov, V.M. (1970). A note on infinitely divisible distributions. Theory Probab.


Appl., 15, 319-324.

Krysichi, W., and Kakuszka, M. (1993). Some inequalities for characteristik


functions. Zeszyty Nauk. Politech. Lodz. Mat. #25,13-18.

Kuznetsov, S.M., and Ushakov, N.G. (1986). The problem of the numerical
restoration of a wave front from the intensity distribution. U.S.S.R. Corn-
put. Maths. Math. Phys. 26,100-103.

Lamperti, J. (1966). Probability, Dartmouth College, New York.

Larin, Yu.V. (1993). On concentration of distributions of sums of independent


random vectors on bounded sets. Theory Probab. Appl., 38, 743-751.

Levy, P. (1925). Calcul des Probabilites. Gauthier-Villars, Paris.

Lenth, R.V., Markatou, M., and Tsimikas, J. (1995). Robust tests based on the
sample characteristic function. Austral. J. Statist. 37, 45-60.

Linnik, Yu.V., and Ostrovskii, I.V. (1977). The Decomposition of Random Vari-
ables and Vectors. American Math. Soc., Providence.

Loeve, M. (1977). Probability Theory. Springer, Berlin.

Lukacs, E. (1970). Characteristic Functions. Griffin, London.

Lukacs, E. (1983). Developments in Characteristic Function Theory. Griffin,


London.

Lukacs, E., and Laha, R.G. (1964). Applications of Characteristic Functions.


Griffin, London.

Mandelbrot, B. (1960). The Pareto-Levy law and the distribution of income.


Int. Econ. Rev. 1, 79-106.

Mandelbrot, B. (1963). The variation of certain speculative prices. J. Business


36, 3 9 4 - i l 9 .

Marcinkiewicz, J. (1938). Sur une propriete de la loi de Gauss. Math. Zeitschr.


44, 612-618.

Marcus, M.B. (1981). Weak convergence of the empirical characteristic func-


tion. Ann. Probab. 9,194-201.

Markatou, M., and Horowitz, J.L. (1995). Robust scale estimation in the error-
components model using the empirical characteristic function. Canad. J.
Statist. 23, 369-381.
344 BIBLIOGRAPHY

Markatou, ., Horowitz, J.L., and Lenth, R.V. (1995). A robust scale estimator
based on the empirical characteristic function. Statist. Probab. Lett. 25,
185-192.

Medgyessy, P. (1972). On the unimodality of discrete distributions. Period.


Math. Hung. 2, 245-257.

Meintanis, S.G., and Donatos, G.S. (1994). A characterization of the Cauchy


distribution based on the empirical characteristic function. In: Hellenic
European Research on Mathematics and Informatics'94. Hellenic Math.
Soc., Athens, pp. 963-968.

Meshalkin, L.D., and Rogozin, B.A. (1962). An estimate of the distance between
distribution functions in terms of the proximity of their characteristic
functions and its applications to the central limit theorem. In: Limit
Theorems of Probability Theory: Proc. All-Union Colloq., Fergana 1962.
Tashkent, 1963, pp. 49-55.

Murota, K. (1981). test for normality based on the empirical characteristic


function. Rep. Statist. Appl. Res. Un. Japan Sei. Engrs. 28,1-17.

Murota, K., and Takeuchi, K. (1981). The studentized empirical characteristic


function and its application to test for the shape of distribution. Biometri-
ka 68, 55-65.

Naito, K. (1996). On weighting the studentized empirical characteristic func-


tion for testing normality. Commun. Statist. Simul. 25, 201-213.

Navard, S., Seaman, J., and Young, D. (1993). A characterization of dis-


crete unimodality with applications to variance upper bounds. Ann. Inst.
Statist. Math. 45, 603-614.

Oberhettinger, F. (1973). Fourier Transforms of Distribution Functions and


their Inverses. Academic Press, London.

Ostrovskii, I.V. (1986). Arithmetic of probabilistic distributions. Theory Prob.


Appl., 31, 3-30.

Parzen, . (1962). On the estimation of a probability density function and the


mode. Ann. Math. Statist. 33,1065-1076.

Paulson, A.S., Holcomb, E.W., and Leitch, R.A. (1975). The estimation of the
parameters of the stable laws. Biometrika 62,163-170.

Petrov, V.V. (1975). Sums of Independent Random Variables. Springer, Berlin.

Pham Dinh Tuan (1995). On the discretisation error in the computation of


the empirical characteristic function. J. Statist. Comput. Simulation 53,
129-141.
BIBLIOGRAPHY 345

Polya, G. (1949). Remarks on characteristic functions. Proc. 4th Berkeley Symp.


Math. Statist. & Probab. Univ. California Press, Berkeley, pp. 115-123.

Popov, V.A. (1987) On the inequalities of Berry-Esseen and V.M. Zolotarev.


Lecture Notes Math. 1233,114-124.

Postnikova, L.P., and Yudin, A.A. (1977). On the concentration function. Theory
Probab. Appl. 22, 362-366.

Postnikova, L.P., and Yudin, A.A. (1980). An analytic method for estimates of
the concentration function. Proc. Steklov Inst. Math. 143,153-161.

Prakasa Rao, B.L.S. (1983). Nonparametric Functional Estimation. Academic


Press, New York.

Prawitz, H. (1972). Limits for a distribution, if the characteristic function is


given in a finite domain. Skand. Aktuartidskr. 55,138-154.

Prawitz, H. (1973). Ungleichungen fr den absoluten Betrag einer charakter-


istischen Funktion. Skand. Aktuartidskr. 56, 11-16.

Prawitz, H. (1975). Weitere Ungleichungen fr den absoluten Betrag einer


charakteristischen Funktion. Scand. Actuartidskr. 58, 21-28.

Prawitz, H. (1991). Noch einige Ungleichungen fr charakteristische Funktio-


nen. Scand. Actuartidskr. 74,49-73.

Press, S.J. (1972a). Estimation in univariate and multivariate stable distribu-


tions. J. American Statist. Assoc. 67, 842-846.

Press, S.J. (1972b). Applied Multivariate Analysis. Holt, New York.

Prokhorov, Yu.V. (1961). The method of characteristic functionals. In: Proc.


Fourth Berkeley Symp. Math. Stat. Probab. II, pp. 403-420.

Prokhorov, Yu.V. (1962). Extremal problems in limit theorems. In: Proc. Sixth
All-Union Conf. Theor. Probab. Math. Statist. (Vilnius, 1960). Gos. Izdat.
Politichesk. i Nauchn. Lit. Litovsk. SSR, Vilnius, pp. 77-84.

Prokhorov, Yu.V. (1965). On a characterization of a class of probability distribu-


tions by distributions of some statistics. Theory Prob. Appl., 10, 438-445.

Prokhorov, Yu.V., and Rozanov, Yu.A. (1969). Probability Theory, Basic Con-
cepts, Limit Theorems, and Random Processes. Springer, Berlin.

Raikov, D.A. (1940). On positive definite functions. Soviet Math. Dokl. 26, 857-
862.
346 BIBLIOGRAPHY

Ramachandran, . (1967). Advanced Theory of Characteristic Functions.


Statist. Publ. Soc., Calcutta.

Ramachandran, B. (1969). On characteristic functions and moments. Sankhy


A31, 1-12.

Ramachandran, B. (1996a). Complex-valued characteristic functions with


(some) real powers. Sankhy A58, 1-7.

Ramachandran, B. (1996b). Characteristic functions taking constant values on


intervals of the real line. Statist. Probab. Lett. 28, 269-270.

Ramachandran, B. (1997). Characteristic functions with some powers real. III.


Statist. Probab. Lett. 34, 33-36.

Ramachandran, B., and Rao, C.R. (1968). Some results on characteristic func-
tions and characterizations of the normal and generalized stable laws.
Sankhy A30, 125-140.

Rao, C.R. (1965). Linear Statistical Inference and Its Applications. Wiley, New
York.

Romano, J.R, and Siegel, A.F. (1986). Counterexamples in Probability and


Statistics. Wadsworth & Brooks, Monterey, CA.

Rosen, B. (1961). On the asymptotic distribution for sums of independent


random variables. Arkiv Matem. 4, 323-332.

Rosenblatt, M. (1975). A quadratic measure of deviation of two-dimensional


density estimates and a test of independence. Ann. Statist. 3 , 1 - 1 4 .

Rossberg, H.-J. (1989). The Parseval equation as a tool for t h e continuation of


characteristic functions and positive definite probability densities. Math.
Nachr. 141, 227-232.

Sadooghi-Alvandi, S.M. (1993). A proof of the continuity theorem for charac-


teristic functions. Statist. Probab. Lett. 16, 27-28.

Sakovich, G.N. (1965). On the width of spectra. Dopovidi A.N. Ukr. SSR, 11,
1427-1430.

Salikhov, N.P. (1996). An estimate of the concentration function by the Esseen


method. Theory Probab. Appl. 41, 504-518.

Sapogov, N.A. (1979). Weak stability of J.Marcinkiewicz's theorem and some


inequalities for characteristic functions. Zap. Nauchn. Sem. LOMI 85,
193-196.
BIBLIOGRAPHY 347

Schoenberg, I.J. (1938). Metric spaces and completely monotone functions.


Ann. Math. 39, 811-841.
Sethuraman, J. (1964). On the probability of large deviations of families of
sample means. Ann. Math. Statist. 35, 1304-1316.

Shervashidze, T. (1997). Bounds for the characteristic functions of the sys-


tem of monomials in random variables and of its trigonometric analogue.
Georgian Math. J. 4, 579-584.

Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis.
Chapman & Hall, London.

Staudte, R.G., and Tata, M.N. (1970). Complex roots of real characteristic
functions. Proc. American Math. Soc. 25, 238-246.

Stoyanov, J. (1987). Counterexamples in Probability. Wiley, New York.

Thornton, J.C., and Paulson, A.S. (1977). Asymptotic distribution of character-


istic function-based estimators for the stable laws. Sankhy A39, 341-
354.

Titchmarsh, E. (1937). it Introduction to the Theory of Fourier Integrals.


Clarendon Press, Oxford.

Trigub, R.M. (1989). A criterion for the characteristic function and a test of the
Polya type for the radial functions of several variables. Theory Probab.
Appl. 34, 805-810.

Tsaregradskii, I.P. (1958). On the uniform approximation of the binomial dis-


tribution by infinitely divisible laws. Theory Probab. Appl. 3, 470-474.

Ushakov, N.G. (1981a). Some inequalities for characteristic functions of uni-


modal distributions. Theory Probab. Appl. 26, 595-598.

Ushakov, N.G. (1981b). On the maximum of the probability density of a sum


of independent random vectors with spherically symmetric distributions.
Soviet Math. Dokl. 24, 560-562.

Ushakov, N.G. (1982). On a problem of Renyi. Theory Probab. Appl. 27, 361-
362.
Ushakov, N.G. (1985). Upper estimates of maximum probability for sums of
independent random vectors. Theory Probab. Appl. 30, 38-^49.

Ushakov, N.G. (1997). Lower and upper bounds for characteristic functions. J.
Math. Sei. 84, 1179-1189.

Ushakov, N.G. (1998). On discrete unimodality. J. Math. Sei. (to appear).


348 BIBLIOGRAPHY

Ushakov, N.G., and Ushakov, V.G. (1999a). Some Inequalities for Characteristic
Functions of Densities with Bounded Variation. Preprint Dept. Statistics,
University of Oslo.

Ushakov, N.G., and Ushakov, V.G. (1999b). Some inequalities for multivariate
characteristic functions. J. Math. Sei. (to appear).

Ushakov N.G., and Ushakov V.N. (1997). An estimation of the decomposition


stability of the Poisson distribution into identical components. J. Math.
Sei. 83,103-108.

Ushakov, N.G., and Ushakova, A.P. (1995). Estimations of the decomposition


stability into identical components. Statist. Probab. Lett. 25, 221-229.

Ushakov, V.G., and Ushakov, N.G. (1984). On indecomposable laws with in-
finitely divisible projections. Theory Probab. Applic., 29, No. 3, 596-598.

Ushakov, V.G., and Ushakov, N.G. (1986). On decomposing blends of probability


distributions. Theory Probab. Appl. 31, 319-322.

Ushakov, V.G., and Ushakov, N.G. (1999). Several inequalities for characteris-
tic functions. Vestnik Mosk. Univ., Ser.15 (to appear).

Velikoivanenko, A.I. (1987). Multidimensional analogues of the Polya theorem.


Theor. Probab. and Math. Statist. #34, 39-46.

Velikoivanenko, A.I. (1992). One-dimensional and multidimensional distribu-


tions of Polya type. Theor. Probab. and Math. Statist., #44, 29-35.

Wand, M.P., and Jones, M.C. (1995). Kernel Smoothing. Chapman & Hall,
London.

Watson, G.S., and Leadbetter, M.R. (1963). On the estimation of the probability
density. Ann. Math. Statist. 34, 480-491.

Welsh, A.H. (1984). A note on scale estimates based on the empirical character-
istic function and their application to test for normality. Statist. Probab.
Lett., 2, 345-348.

Welsh, A.H. (1985). An angular approach for linear data. Biometrika 72, 441-
450.

Welsh, A.H. (1986). Implementing empirical characteristic function proce-


dures. Statist. Probab. Lett. 4, 65-67.

Wolfe, S.J. (1975). On derivatives of characteristic functions. Ann. Probab. 3,


737-738.
BIBLIOGRAPHY 349

Zaitsev, A.Yu. (1982). Estimates of the Levy-Prokhorov distance in terms of


characteristic functions and some of their applications. Zap. Nauchn. Sem.
Otdel. Mat. Inst. Steklov (LOMI) 119,108-127.

Zaitsev, A.Yu. (1987). On the logarithmic factor in smoothing inequalities for


the Levy and Levy-Prokhorov distances. Theory Probab. Appl. 31, 691-
693.

Zolotarev, V.M. (1957) Mellin-Stieltjes transforms in probability theory. Theory


Probab. Appl. 2, 433-460.

Zolotarev, V.M. (1965). On the closeness of distributions of two sums of inde-


pendent random variables. Theory Probab. Appl. 10, 472478.

Zolotarev, V.M. (1967). A sharpening of the inequality of Berry-Esseen. Z.


Wahrscheinlichkeitstheorie 8, 332-342.

Zolotarev, V.M. (1968). On the problem of stability of of the decomposition of


the normal law into components. Theory Probab. Appl., 13, 697-700.

Zolotarev, V.M. (1970). Some new inequalities in probability connected with


Levy's metric. Soviet Math. Dokl. 11, 231-234.

Zolotarev, V.M. (1971). Estimates of the difference between distributions in the


Levy metric. Proc. Steklov Inst. Math. 112, 232-240.

Zolotarev, V.M. (1986). One-Dimensional Stable Distributions. American Math.


Soc., Providence.

Zolotarev, V.M., and Senatov, V.V. (1975). Two-sided estimates of Levy's metric.
Theory Probab. Appl. 20, 234-245.

Zolotarev, V.M., and Uchaikin, V.V. (1999). Chance and Stability. VSP, Utrecht.

Zygmund, A. (1947). A remark on characteristic functions. Ann. Math. Stat.


18,272-276.
Subject Index

additive type 70, 85 of hyperexponential distribution


-symmetric distribution 64 295
of Laplace distribution 296
bandwidth 199 of lattice distribution 2, 35
Berman criterion 10 of logarithmic distribution 297
Bochner-Khintchine criterion 8, 25 of logistic distribution 298
multi-dimensional 57 of negative binomial distribu-
tion 299
characteristic function 1 of normal distribution 300
analytic 52-54 of Pearson distribution of the
even 3 third type 301
multivariate 54 of Pearson distribution of the
of absolutely continuous distri- seventh type 302
bution 1, 3 of Poisson distribution 303
of arc sine distribution 282 of projection 57
of -symmetric distribution 64 of rectangular distribution 308
of Bessel distribution 283 of Simpson distribution 307
of beta distribution 284 of singular distribution 3
of bilateral exponential distri- of spherically symmetric distri-
bution 286 bution 62
of binomial distribution 287 of stable distribution 304
of Cauchy distribution 288 of Student distribution (f-dis-
of convex Khintchine distribu- tribution) 306
tion 289 of symmetric distribution
of convolution 3 of triangular distribution 307
of discrete distribution 1-3 of f-distribution 306
of discrete uniform distribution of uniform distribution 308
290 of unimodal distribution 43-46
of discrete unimodal distribu- of 2 -distribution 311
tion 46-52 of z-distribution 309
of exponential distribution 291 periodic 2, 20, 21
of gamma-distribution 292 real 3
of geometrical distribution 293 characteristic symmetry function 236
of hyperbolic cosine distribu- concentration function 31,81
tion 294 continuity theorem 4, 5

351
352 SUBJECT INDEX

multi-dimensional 55 absolute 39
convolution theorem 3
multi-dimensional 55 nondegenerate distribution 114
covariance function 10 nonnegative definite function 8
Cramer criterion 9
Parseval equality 7
Cramer theorem 53
multi-dimensional 56
derivatives of characteristic func- Parseval-Plancherel identity 7, 8,
tion 22, 39- 40, 88 36
discrete unimodal distribution 46 multi-dimensional 57
in multi-dimensional case 60 Polya criterion 19, 58, 63
principal value 6, 133
empirical characteristic function 160 projection 57, 61
multi-dimensional 163 projection estimator 194
empirical characteristic process 174 of normal distribution 61
empirical distribution function 160 of spherically symmetric distri-
expansion of characteristic function bution 62
40-41, 88
sine estimator 220-226
function of bounded variation 32, sine kernel 201, 219-226
79 smoothing parameter 199
spherically symmetric distribution
integrated squared error estimator 61-64, 129
195 stable distribution 188
inversion theorem 5, 35 sufficient conditions for character-
for density 6 istic function 16-22
for lattice distributions 7 superkernel 219
multi-dimensional 56, 56 symmetrization 17
for density 56
Trigub criterion 11
kernel density estimator 199 truncation inequalities 30
Khintchine criterion 9
unimodal distribution 43
Levy metrics 34 spherically symmetric 62
-metrics 35 uniqueness theorem 2

Marcinkiewicz theorem 23 variation 32


mean absolute error (MAE) 200 weak convergence 4, 55
mean integrated absolute error
(MIAE) 200
mean integrated squared error
(MISE) 200
mean squared error (MSE) 200
mixture 17
moment 39-42, 60
Author Index

Ash, R.B. 235, Feller, W. 1, 37, 37, 64, 259


Askey, R. 45, 58, 65 Feuerverger, A. 175, 227, 231, 232,
236, 257
Bahadur, R.R. 180 Frazer, D.A.S. 234
Baringhaus, L. 248, 249, 257, 258
Bartlett, M.S. 210 Gajek, G. 257
Bentkus, V. 158 Galambos, J. 1,64
Berman, S.M. 10, 65 Glad, I.K. 210, 257, 257
Bhattacharya, R.N. 64 Gnedenko, B.V. 1, 64, 141, 274
Billingsley, P. 234 Gneiting, T. 64
Blum, J.R. 227, 256, 257 Griffiths, C.R. 257
Brown, B.M. 42 Gyrfi, L.198, 200
Brker, H.U. 186, 256 Gtze, F. 158

Cambanis, S. 64 Hall, P. 257


Chiu, S.-T. 257 Heathcote, C.R. 65, 186, 237, 238,
Cramer, . 1, 65, 256 256-258
Csrg, S. 158, 175, 227, 228, 230, Hengartner, W. 81, 105, 112
237, 238, 243, 248, 256- Henze, N. 248, 249, 257, 258
258 Hjort, N.L. 210, 257,
Cuppens, R. 1, 64 Hoeffding, W. 182, 227
Holcomb, E.W. 256
Danschke, R. 249 Holtsmark, J. 188
Daugavet, A.I. 158 Horowitz, J.L. 197, 198, 256, 257
Davis, K.B. 210, 257 Hsu, PL. 42
De Silva, B.M. 257 Hsler, J. 186, 256
Devroye, L. 181, 198, 200, 210, 256
Dharmadhikari, S. 52 Ibragimov, I.A. 1, 64, 257
Doob, J.L. 158
Joag-Dev, K. 52
Epps, T.W. 257 Jones, M.C. 201, 210, 257
Esseen, C.-G. 36, 65, 141
Kagan, A.M. 42, 70
Fan, Y. 254, 258 Kallenberg, . 65
Fang, K.-T. 61, 64 Kankainen, A. 227, 231, 257
Feigin, P.D. 256, 258 Kawata, . 1, 64

353
354 AUTHOR INDEX

Keener, R. 64 Parzen, . 210


Keller, H.-D. 175, 256 Paulson, A.S. 200, 256
Kellermeier, J. 258 Petrov, V.V. 1, 65, 158, 167
Kent, J.T. 256 Pitman, J.W. 25, 65
Khasminskii, R.Z. 257 Plya, G. 19
Khintchine, A.Ya. 43, 48, 49 Postnikova, L.P. 65
Kiefer, J. 227 Prakasa Rao, B.L.S. 257
Kolchinskii, V.l. 256 Prawitz, H. 134-136, 158
Kolmogorov, A.N. 1, 64, 141, 274 Press, S.J. 188, 256
Konakov, V.D. 257 Prokhorov, Yu.V. 1, 64, 70, 158, 259
Kotz, S. 61, 64 Pulley, L.B. 257
Koutrouvelis, I.A. 256, 258
Kruglov, V.M. 277 Raikov, D.A. 65
Kuznetsov, S.M. 265 Ramachandran, . 1, 42, 52, 64,
259,270
Laha, R.G. 1, 64 Ranga Rao, R. 64
Larin, Yu.V. 141 Rao, C.R. 42, 42, 70, 192
Leadbetter, M.R. 257 Rogozin, B.A. 65
Lehmann, E.L. 273 Romano, J. P. 259
Leitch, R.A. 193, 256 Rosenblatt, M. 227
Lenth, R.V. 197, 198, 256, 257 Rozanov, Yu.A. 1, 64, 259
Linnik, Yu.V. 1, 36, 42, 52, 64, 70,
275 Sakovich, G.N. 158
Salikhov, N.P. 141, 158
Loeve, . 1, 64
Sapogov, N.A. 158
Lukacs, . 1, 42, 52, 64, 259, 262,
Schoenberg, I.J. 61
271, 274
Seaman, J. 50
Levy, P. 64
Senatov, V.V. 34, 65, 152
Sethuraman, J. 176, 179
Mandelbrot, B. 188
Siegel, A.F. 259
Marcus, M.B. 175, 256
Silverman, B.W. 250, 257
Markatou, M. 197, 198, 256, 257
Simons, G. 64
McDunnough, P. 256
Stoyanov, J. 259
Medgyessy, P. 46, 65
Susarla, V. 256, 257
Meshalkin, L.D. 65
Mureika, R.A. 175, 232, 236, 256, Takeuchi, K. 257, 258
257 Theodorescu, R. 81, 105, 112
Murison, R.D. 257 Thornton, J.C. 256
Murota, K. 257, 258 Titchmarsh, E. 199
Naito, K. 257 Totik, V. 256
Navard, S. 50 Trigub, R.M. 11, 58, 65
Ng, K.W. 61,64 Tsaregradskii, I.P. 65

Oberhettinger, F. 281 Ushakov, N.G. 51, 52, 65, 158, 158,


Ostrovskii, I.V. 52, 64, 275 210, 231,257, 265, 276
AUTHOR INDEX 355

Ushakov, V.G. 158, 276


Ushakova, A.P. 158

Van Eeden, C. 207


Van Es, A.J. 207
Velikoivanenko, A.I. 58, 62, 65, 278

Wagner, T. 248, 257, 258


Wand, M.P. 201, 210, 257
Watson, G.S. 257
Welsh, A.H. 256, 257

Young, D. 50
Yudin, A.A. 65

Zaitsev, A.Yu. 35
Zirkler, B. 248, 257, 258
Zolotarev, V.M. 9, 34, 42, 65, 152,
188
Zygmund, A. 271

You might also like