l5c 2

Outline:
• Neyman-Pearson test for simple binary hypotheses, receiver

operating characteristic (ROC).
• An introduction to classical composite hypothesis testing.
Reading:
• Chapter 3 in Kay-II,
• (part of) Chapter 5 in Levy.
EE 527, Detection and Estimation Theory, # 5c 1

False-alarm and Detection Probabilities for
Binary Hypothesis Tests: A Reminder
(see handout # 5)
In binary hypothesis testing, we wish to identify which

hypothesis is true (i.e. make the appropriate decision):
H0 : θ ∈ spΘ(0) null hypothesis versus

H1 : θ ∈ spΘ(1) alternative hypothesis
where
spΘ(0) ∪ spΘ(1) = spΘ, spΘ(0) ∩ spΘ(1) = ∅.
X
Recall that a binary decision rule φ(x) maps |{z} to {0, 1}:
data space

1, decide H1,
φ(x) =
0, decide H0.
which partitions the data space X [i.e. the support of

fX | Θ(x | θ)] into two regions:
X0 = {x : φ(x) = 0} and X1 = {x : φ(x) = 1}.

Recall the probabilities of false alarm and miss:
PFA(φ(X), θ) = E X | Θ[φ(X) | θ]
Z
= fX | Θ(x | θ) dx for θ ∈ spΘ (0) (1)
X1
PM(φ(X), θ) = E X | Θ[1 − φ(X) | θ]

Z
= 1− fX | Θ(x | θ) dx
| X1 {z }
PD (φ(X ),θ)
Z
= fX | Θ(x | θ) dx for θ in spΘ (1) (2)
X0
and the probability of detection (correctly deciding H1):

Z
PD(φ(X), θ) = E X | Θ[φ(X) | θ] = fX | Θ(x | θ) dx for θ in spΘ(1).
X1
For simple hypotheses, spΘ(0) = {θ0}, spΘ(1) = {θ1}, and

spΘ = {θ0, θ1}, the above expressions simplify, as shown in the
following.

Probabilities of False Alarm (P FA) and
Detection (P D) for Simple Hypotheses
Z
PFA(φ(X), θ0) = fX | Θ(x | θ0) dx
X1
= PrX | Θ{test statistic (ts) > τ | θ0} (3)

| {z }
X ∈X1
Z
PD(φ(X), θ1) = fX | Θ(x | θ1) dx
X1
= PrX | Θ{ts > τ} | θ1}.

| {z (4)
X ∈X1
Comments:
(i) As the region X1 shrinks (i.e. τ % ∞), both of the above

probabilities shrink towards zero.
(ii) As the region X1 grows (i.e. τ & 0), both probabilities

grow towards unity.
(iii) Observations (i) and (ii) do not imply equality between

PFA and PD; in most cases, as X1 grows, PD grows more
rapidly than PFA (i.e. we better be right more often than we
are wrong).
(iv) However, the perfect case where our rule is always right
and never wrong (PD = 1 and PFA = 0) cannot occur when
the conditional pdfs/pmfs fX | Θ(x | θ0) and fX | Θ(x | θ1)
overlap.
(v) Thus, to increase the detection probability PD, we must

also allow for the false-alarm probability PFA to increase.
This behavior
• represents the fundamental tradeoff in hypothesis testing
and detection theory and
• motivates us to introduce a (classical) approach to testing
simple hypotheses, pioneered by Neyman and Pearson, to
be discussed next.
Receiver Operating Characteristic (ROC) allows us to visualize

the realm of achievable PFA(φ(X), θ0) and PD(φ(X), θ1).

A point (PFA, PD) is in the shaded region if we can find a rule
φ(X) such that PFA(φ(X), θ0) = PFA and PD(φ(X), θ1) = PD.

Neyman-Pearson Test for Simple Hypotheses
Bayesian tests are criticized because they require specification

of prior distribution (pmf or, in the composite-testing case,
pdf) and the cost-function parameters L(i | j).
An alternative classical solution for simple hypotheses is

developed by Neyman and Pearson.
Select the decision rule φ(X) that maximizes

PD(φ(X), θ1) while ensuring that the probability of false alarm
PFA(φ(X), θ0) is less than or equal to a specified level α.
Setup:
• Simple hypothesis testing:
H0 : θ = θ0 versus
H1 : θ = θ1.
• Parametric data models fX | Θ(x | θ0), fX | Θ(x | θ1).
• No prior pdf/pmf on Θ is available.

• Define the set of all rules φ(X) whose probability of false
alarm is less than or equal to a specified level α:

Dα = φ(X) PFA(φ(X), θ0) ≤ α}

see also (3).
A Neyman-Pearson test φNP(x) solves the constrained

optimization problem:
φNP(x) = arg max PD(φ(x), θ1).

φ(x)∈Dα
We apply Lagrange multipliers to solve this optimization

problem; consider the Lagrangian:
L(φ(x), λ) = PD(φ(x), θ1) + λ [α − PFA(φ(x), θ0)]
with λ ≥ 0. A decision rule φ(x) will be optimal if it

maximizes L(φ(x), λ) and satisfies the Karush-Kuhn-Tucker
(KKT) condition:
λ [α − PFA(φ(x), θ0)] = 0. (5)
Upon using (3) and (4), the Lagrangian can be written as

Z
L(φ(x), λ) = λ α + [fX | Θ(x | θ1) − λ fX | Θ(x | θ0)] dx.
X1

Consider maximizing L(φ(x), λ) with respect to φ(x) for a
given λ. Then, φ(x) needs to satisfy

1,  Λ(x) > λ
φλ(x) = 0 or 1, Λ(x) = λ (6)
0, Λ(x) < λ

where
fX | Θ(x | θ1)
Λ(x) =
fX | Θ(x | θ0)
is the likelihood ratio. The values x that satisfy Λ(x) = λ
can be allocated to either X1 or X0. To completely specify the
optimal test, we need to select
• a λ such that the KKT condition (5) holds and
• an allocation rule for those x that satisfy Λ(x) = λ.
Now, consider two versions of (6) for a fixed threshold λ:


 1, Λ(x) > λ
φU,λ(x) = 1, Λ(x) = λ
0, Λ(x) < λ

and 
 1, Λ(x) > λ
φL,λ(x) = 0, Λ(x) = λ .
0, Λ(x) < λ


In the first case, all observations x for which Λ(x) = λ are
allocated to X1; in the second case, these observations are
allocated to X0.
Consider the cumulative distribution function (cdf) of Λ(X) =
Λ under H0:
FΛ | Θ(l | θ0) = PrΛ | Θ{Λ ≤ l | θ0}.
Define
f0 = FΛ | Θ(0 | θ0) = PrΛ | Θ{Λ ≤ 0 | θ0}.
Recall that cdf FΛ | Θ(l | θ0) must be nondecreasing and right-
continuous, but may have discontinuities.

Consider three cases, depending on α:
(i) When
1 − α < f0 i.e. 1 − f0 < α (7)
we select the threshold λ = 0 and apply the rule

1, Λ(x) > 0
φL,0(x) = . (8)
0, Λ(x) = 0
In this case, KKT condition (5) holds and, therefore, the

test (8) is optimal; its probability of false alarm is
PFA(φL,0(x), θ0) = 1 − f0 < α see (7).
An example of this case corresponds to λ1 = 0 and 1 − α1

in the above figure.
(ii) Suppose that
1 − α ≥ f0 i.e. 1 − f0 ≥ α (9)
and there exists a λ such that
FΛ | Θ(λ | θ0) = 1 − α. (10)
Then, by selecting this λ as the threshold and using

1, Λ(x) > λ
φL,λ(x) = (11)
0, Λ(x) ≤ λ

we obtain a test with false-alarm probability
PFA(φL,λ(x), θ0) = 1 − FΛ | Θ(λ | θ0) = α see (9)
the KKT condition (5) holds, and the test (10) is optimal.
An example of this case corresponds to λ2 and 1 − α2 in the
above figure.
(iii) Suppose that
1 − α ≥ f0 i.e. 1 − f0 ≥ α
as in (ii), but cdf FΛ | Θ(l | θ0) has a discontinuity point

λ > 0 such that
FΛ | Θ(λ− | θ0) < 1 − α < FΛ | Θ(λ+ | θ0)
where FΛ | Θ(λ− | θ0) and FΛ | Θ(λ+ | θ0) denote the left and
right limits of FΛ | Θ(λ | θ0) at l = λ. If this case happens
in practice, we can try to avoid the problem by changing
our specified α, which is anyway not God-given, but
chosen rather arbitrarily. We should pick a value of α
that satisfies the KKT condition.
Suppose that we are not allowed to change α; this gives us

a chance to practice some basic probability. First, note that

• φL,λ(x) has false-alarm probability
PFA(φL,λ(x), θ0) = 1 − FΛ | Θ(λ+ | θ0) < α,
• φU(x, λ) has false-alarm probability
PFA{φU,λ(x), θ0} = 1 − FΛ | Θ(λ− | θ0) > α
and KKT optimality condition (5) requires that

PFA(φλ(x), θ0) = α. We focus on the tests of the form
(6) and construct the optimal test via randomization.
Define the probability
α − PFA(φL,λ(x), θ0)
p =
PFA(φU,λ(x), θ0) − PFA(φL,λ(x), θ0)
which clearly satisfies 0 < p < 1.

Select φU,λ(x) with probability p and φL,λ(x) with
probability 1 − p. This test indeed has the form (6); its
probability of false alarm is
PFA(φλ(x), θ0)
= PFA(φL,λ(x), θ0) + p [PFA(φU,λ(x), θ0) − PFA(φL,λ(x), θ0)] = α.
Since KKT condition (5) is satisfied, the randomized test


 1, Λ(x) > λ
φλ(x) = 1 w.p. p and 0 w.p. 1 − p, Λ(x) = λ
0, Λ(x) < λ

is optimal.

ROC Properties when Likelihood Ratio is a
Continuous Random Variable Given θ
Based on the Neyman-Pearson theory, if we set PFA = α, then

the test that maximizes PD must be a likelihood-ratio test of
the form (6). Thus, the ROC curve separating achievable and
non-achievable pairs (PFA, PD) corresponds to the family of
likelihood-ratio tests.
For simplicity, we focus here on the case where the likelihood

ratio is a continuous random variable given θ. First, note that,

for the likelihood-ratio test,
Z
PFA(τ ) = fX | Θ(x | θ0) dx
X1
Z +∞
= PrX | Θ{Λ(X) > τ | θ0} = fΛ | Θ(l | θ0) dl (12)
τ
Z
PD(τ ) = fX | Θ(x | θ1) dx
X1
Z +∞
= PrX | Θ{Λ(X) > τ | θ1} = fΛ | Θ(l | θ1) dl (13)
τ
where τ denotes the threshold. Under the continuity

assumption for the likelihood ratio, as we vary τ between
0 and +∞, the point PFA(φ(X), θ0), PD(φ(X), θ1) moves
continuously along the ROC curve. If we set τ = 0, we always
select H1 and, therefore,
PFA(0) = PD(0) = 1.
Conversely, if we set τ = +∞, we always select H0 and,

therefore,
PFA(+∞) = PD(+∞) = 0.
In summary,
ROC Property 1. If the likelihood ratio is a continuous
random variable given θ, the points (0, 0) and (1, 1) belong to
ROC.

Now, differentiate (12) and (13) with respect to τ :
dPD(τ )
= −fΛ | Θ(τ | θ1)
dτ
dPD(τ )
= −fΛ | Θ(τ | θ0)
dτ
implying
dPD(τ ) fΛ | Θ(τ | θ1)
= = τ.
dPFA(τ ) fΛ | Θ(τ | θ0)
In summary,
ROC Property 2. If the likelihood ratio is a continuous
random variable given θ, the slope of ROC at point
(PFA(τ ), PD(τ )) is equal to the threshold τ of the corresponding
likelihood-ratio test.
In particular, this result implies that the slope of ROC is
• τ = +∞ at (0, 0) and
• τ = 0 at (1, 1).
ROC Property 3. The domain of achievable pairs (PFA, PD)

is convex and the ROC curve is concave. This property holds
in general, including the case where the likelihood ratio is a
mixed or discrete random variable given θ.
HW: Prove ROC Property 3.

ROC Property 4. All points on ROC curve satisfy
PD(τ ) ≥ PFA(τ ).
This property holds in general, including the case where the

likelihood ratio is a mixed or discrete random variable given θ.

Example: Simple Hypotheses,
Coherent Detection in Gaussian Noise with
Known Covariance Matrix
Simple hypotheses: the space of the parameter µ and its

partitions are
spµ = {µ0, µ1}, spµ(0) = {µ0}, spµ(1) = {µ1}.
The measurement vector X given µ is modeled using
fX | µ(x | µ) = N (x | µ, C)
1
= p exp[− 21 (x − µ)T C −1 (x − µ)]
|2 π C|
where C is a known positive definite covariance matrix. Our

likelihood-ratio test is
fX | µ(x | µ1)
Λ(x) =
| {z } fX | µ(x | µ0)
likelihood ratio
exp[− 21 (x − µ1)T C −1 (x − µ1)] H1

= 1 T −1
≷ τ.
exp[− 2 (x − µ0) C (x − µ0)

Therefore,
H1
T −1 T −1
− 12 (x − µ1) C (x − µ1) + 21 (x − µ0) C (x − µ0) ≷ ln τ
i.e.
H1
T −1 1
(µ1 − µ0) C [x − (µ0 + µ1)] ≷ ln τ.
2
and, finally,
H1 4
−1
T (x) = s C T
x ≷ ln τ + 12 (µ1 − µ0)T C −1 (µ1 + µ0) = γ
where we have defined
4
s = µ1 − µ0.
False-alarm and detection/miss probabilities. Given µ,

T (x) is a linear combination of Gaussian random variables,
implying that it is also Gaussian, with mean and variance:
E X | µ[T (X) | µ] = sT C −1µ

varX | µ[T (X) | µ] = sT C −1s (not a function of µ).

Now,
PFA = PrX | µ{T (X) > γ | µ0}

standard normal random variable
z }| {
n T (X) − sT C −1µ γ − s T
C −1
µ o
0 0
= PrX | µ √ > √ µ0
T
s C s −1 T
s C s−1
!
T −1
γ − s C µ0
= Q √ (14)
T
s C s−1
and
PD = 1 − PM = PrX | µ{T (X) > γ | µ1}

standard normal random variable
z }| {
n T (X) − sT C −1µ γ − s T
C −1
µ o
1 1
= PrX | µ √ > √ | µ1
T
s C s −1 T
s C s −1
!
γ − sT C −1µ1
= Q √ .
T
s C s−1
We use (14) to obtain a γ that satisfies the specified PFA:
T −1
γ −1 s C µ0
√ = Q (PFA) + √
T −1
s C s sT C −1 s
implying
√
PD = Q Q−1(PFA) − sT C −1 s

= Q Q−1(PFA) − d (15)
Here,
√ q
d= sT C −1 s = (µ1 − µ0)T C −1 (µ1 − µ0)
is the deflection coefficient.

Decentralized Detection for Simple Hypotheses
Consider a decentralized detection scenario depicted by
Assumptions:
• The observations X[n], n = 0, 1, . . . , N − 1 made at N

spatially distributed sensors (nodes) follow the same marginal
probabilistic model:
fX | Θ(x[n] | θ) (16)
and are conditionally independent given Θ = θ, which may

not always be reasonable, but leads to an easy solution.

• We wish to test:
H0 : θ = θ0 versus
H1 : θ = θ1.
• Each node n makes a hard local decision d[n] based on

its local observation x[n] and sends it to the headquarters
(fusion center), which collects all the local decisions and
makes the final global decision H0 versus H1. This structure
is clearly suboptimal: it is easy to construct a better
decision strategy in which each node sends its (quantized,
in practice) likelihood ratio to the fusion center, rather than
the decision only. However, such a strategy would have a
higher communication (energy) cost.
The false-alarm and detection probabilities of each node’s
local decision rules can be computed using (16). Suppose
that we have obtained them for each n:
PFA,n, PD,n, n = 0, 1, . . . , N − 1.
We now discuss the decentralized detection problem. Note that
pD(n) | Θ(dn | θ1) = PDd,n

n
(1 − PD,n)1−dn
| {z }
Bernoulli pmf

and, similarly,
dn 1−dn
pD(n) | Θ(dn | θ0) = PFA ,n (1 − PFA,n )
| {z }
Bernoulli pmf
where PFA,n is the nth sensor’s local detection false-alarm

probability. Now,
N
X (dn | θ1) i
hp
D(n) | Θ
ln Λ(d) = ln
n=1
pD(n) | Θ(dn | θ0)
Nh P dn (1 − P )1−dn i H1
D,n D,n
X
= ln dn ≷ ln τ.
PFA,n (1 − PFA,n)1−dn
n=1
To be able to further simplify the above expression, we nof

focus on the case where all sensors have identical performance:
PD,n = PD, PFA,n = PFA
i.e. all local decision thresholds at the nodes are identical.

Define the number of sensors deciding locally to support H1:
N
X −1
u1 = d[n].
n=0

Then, the log-likelihood ratio becomes
P 1 − P H1
D D
log Λ(d) = u1 log + (N − u1) log ≷ log τ
PFA 1 − PFA
or
h P · (1 − P ) i H1 1 − P
D FA FA
u1 log ≷ log τ + N log . (17)
PFA · (1 − PD) 1 − PD
Clearly, each node’s local decision dn is meaningful only if

PD > PFA, which implies
PD · (1 − PFA)
>1
PFA · (1 − PD)
the logarithm of which is therefore positive, and the decision

rule (17) further simplifies to
H1
u1 ≷ τ 0 .
The Neyman-Person performance analysis of this detector

is easy: the random variable U1 is binomial given θ (i.e.
conditional on the hypothesis) and, therefore,

N
PrU1 | Θ{U1 = u1 | θ} = pu1 (1 − p)N −u1
u1

where p = PFA under H0 and p = PD under H1. Hence, the
“global” false-alarm probability is
PFA,global = PrU1 | Θ{U1 > τ 0 | θ0}

N
X N u1
= · PFA · (1 − PFA)N −u1 .
0
u1
u1 =dτ e

An Introduction to Classical Composite
Hypothesis Testing
First, recall that, in composite testing of two hypotheses, we

have spΘ(0) and spΘ(1) that form a partition of the parameter
space spΘ:
spΘ(0) ∪ spΘ(1) = spΘ, spΘ(0) ∩ spΘ(1) = ∅
and that we wish to identify which of the two hypotheses is

true:
H0 : Θ ∈ spΘ(0) null hypothesis versus

H1 : Θ ∈ spΘ(1) alternative hypothesis.
Here, we adopt the classical Neyman-Pearson approach: given

an upper bound α on the false-alarm probability, maximize the
detection probability.
The fact that H0 is composite means that the false-alarm

probability for a rule φ(X) is a function of θ:
PFA(φ(X), θ))
where θ ∈ spΘ(0). Therefore, to satisfy the upper bound α, we

consider all tests φ(X) such that
max PFA(φ(X), θ)) ≤ α. (18)

θ∈spΘ (0)
In this context,
max PFA(φ(X), θ) (19)
θ∈spΘ (0)
is typically referred to as the size of the test φ(X). Therefore,
the condition (18) states that we focus on tests whose size is
upper-bounded by α.
Definition. Among all tests φ(X) whose size is upper-

bounded by α [i.e. (18) holds], we say that φUMP(X) is a
uniformly most powerful (UMP) test if it satisfies
PD(φUMP(X), θ) ≥ PD(φ(X), θ)
for all θ ∈ spΘ(1).
This is a very strong statement and very few hypothesis-testing

problems have UMP tests. Note that Neyman-Pearson tests
for simple hypotheses are UMP.
Hence, to find an UMP test for composite hypotheses, we
need to first write a likelihood ratio for the simple hypothesis
test with spΘ(0) = {θ0}, spΘ(1) = {θ1}, and spΘ = {θ0, θ1}
and then transform this likelihood ratio in such a way that
unknown quantities (e.g. θ0 and θ1) disappear from the test
statistic.

(1) If such a transformation can be found, there is hope that
a UMP test exists.
(2) However, we still need to figure out how to set a decision

threshold (τ , say) such that the upper bound (18) is satisfied.

Example 1: Detecting a Positive DC Level in
AWGN (versus zero DC level)
Consider the following composite hypothesis-testing problem:
H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus

H1 : θ>0 i.e. θ ∈ spΘ(1) = (0, +∞)
where the measurements X[0], X[1], . . . , X[N − 1] are

conditionally independent, identically distributed (i.i.d.) given
Θ = θ, modeled as
{X[n] | Θ = θ} = θ + W [n] n = 0, 1, . . . , N − 1
with W [n] a zero-mean white Gaussian noise with known

variance σ 2, i.e.
W [n] ∼ N (0, σ 2)
implying
N −1
1 1 X h i
fX | Θ(x | θ) = p ·exp − 2 (x[n]−θ)2 (20)
(2 π σ 2)N 2 σ n=0
where x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for

θ is
N
1 X
x= x[n].
N n=1

Now, find the pdf of x given Θ = θ:
fX | Θ(x | θ) = N (x | θ, σ 2/N ). (21)
We start by writing the classical Neyman-Pearson test for the

simple hypotheses with spsimple
Θ (0) = {0} and sp simple
Θ (1) =
{θ1}, θ1 ∈ (0, +∞):
−1/2
2
fX | Θ(x | θ1) (2 π σ /N ) · exp[− 2 σ21/N (x − θ1)2] H1
= −1/2 1 ≷ λ.
fX | Θ(x | 0) 2
(2 π σ /N ) 2
· exp[− 2 σ2/N (x) ]
Taking log etc. leads to

H1
θ1 x ≷ η.
Since we know that θ1 > 0, we can divide both sides of the

above expression by θ1 and accept H1 if
H1
φ(x) : x ≷ τ.
Hence, we transformed our likelihood ratio in such a way that

θ1 disappears from the test statistic, i.e. we accomplished (1)
above.
Now, on to (2). How to determine the threshold τ such that
the upper bound (18) is satisfied? Based on (25), we know:
fX | Θ(x | 0) = N (x | 0, σ 2/N )

and, therefore,

PFA(φ(X), 0) = PrX | Θ{X > τ 0}
n X −0 τ o
= PrX | Θ p >p 0
σ 2/N σ 2/N
| {z }
standard
normal
random var.
τ
= Q p .
2
σ /N
Note that
τ
max PFA(φ(X), θ) = PFA(φ(X), 0) = Q p =α
θ∈spΘ (0) 2
σ /N
see (18) and (19). The most powerful test is achieved if the
upper bound α in (18) is reached by equality:
r
σ2
τ= · Q−1(α). (22)
N
Hence, we have accomplished (2), since this τ yields exactly

size α for our test φ(X).
To study the performance of the above test, we substitute

(22) into the power function:
n X −θ τ − θ o
PrX | Θ{X > τ | θ} = PrX | Θ p >p θ
2
σ /N 2
σ /N
| {z }
standard
normal
random var.
τ −θ θ
= Q p = Q Q−1(α) − p . (23)
2
σ /N 2
σ /N

Example 2: Detecting a Positive DC Level in
AWGN (versus nonnegative DC level)
Consider the following composite hypothesis-testing problem:
H0 : θ≤0 i.e. θ ∈ spΘ(0) = (−∞, 0] versus

H1 : θ>0 i.e. θ ∈ spΘ(1) = (0, +∞)

conditionally i.i.d. given Θ = θ, modeled as
{X[n] | Θ = θ} = θ + W [n] n = 0, 1, . . . , N − 1
with W [n] a zero-mean white Gaussian noise with known

variance σ 2, i.e.
W [n] ∼ N (0, σ 2)
implying
N −1
1 1 X h
2
i
fX | Θ(x | θ) = p ·exp − 2 (x[n]−θ) (24)
2
(2 π σ )N 2 σ n=0
where x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for

θ is
N
1 X
x= x[n].
N n=1

and
fX | Θ(x | θ) = N (x | θ, σ 2/N ). (25)
simple hypotheses with spsimple
Θ (0) = {θ 0 } and sp simple
Θ (1) =
{θ1}, where θ0 ∈ (−∞, 0] and θ1 ∈ (0, +∞), implying
−1/2
2
fX | Θ(x | θ1) (2 π σ /N ) · exp[− 2 σ21/N (x − θ1)2] H1
= ≷ λ
fX | Θ(x | θ0) (2 π σ 2/N )−1/2 · exp[− 2 σ21/N (x − θ0)2]
and
θ0 < θ1.
Taking log etc. leads to
H1
(θ1 − θ0) x ≷ η
and, since θ0 < θ1, to

H1
φ(x) : x ≷ τ.
Hence, we transformed our likelihood ratio in such a way that

θ0 and θ1 disappear from the test statistic, i.e. we accomplished
(1) above.
The power function of this test is
nX −θ τ − θ o τ −θ
PrX | Θ{X > τ | θ} = PrX | Θ √ > √ θ = Q √
σ/ N σ/ N σ/ N

which is an increasing function of θ. Recall the definition (19)
of test size:
max PFA(φ(X), θ) = max PrX | Θ{X > τ | θ}

θ∈spΘ (0) θ∈spΘ (0)
τ −θ τ
= max Q √ =Q √ .
θ∈(−∞,0] σ/ N σ/ N
The most powerful test is achieved if the upper bound α in

(18) is reached by equality:
σ
τ = √ Q−1(α).
N
Hence, we have accomplished (2), since this τ yields exactly

size α for our test φ(X).

Example 3: Detecting a Completely Unknown
DC Level in AWGN
Consider now the composite hypothesis-testing problem:
H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus

H1 : θ 6= 0 i.e. θ ∈ spΘ(1) = (−∞, +∞)\{0}

conditionally i.i.d. given Θ = θ, following
N −1
1 h 1 X 2
i
fX | Θ(x | θ) = p · exp − 2
(x[n] − θ)
(2 π σ 2)N 2 σ n=0
and x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for θ

1
PN
is x = N n=1 x[n] and the pdf of x given Θ = θ is
fX | Θ(x | θ) = N (x | θ, σ 2/N ). (26)

simple hypotheses with spΘ(0) = {0} and spΘ(1) = {θ1 6= 0}:
θ1 x > η.
We cannot accomplish (1), since θ1 cannot be removed from

the test statistic; therefore, UMP test does not exist for the
above problem.

Monotone Likelihood-ratio Criterion
Consider a scalar parameter θ. We say that
fX | Θ(x | θ)
belongs to the monotone likelihood ratio (MLR) family if the

pdfs (or pmfs) from this family
• satisfy the identifiability condition for θ (i.e. these pdfs are

distinct for different values of θ) and
• there is a scalar statistic T (x) such that, for θ0 < θ1, the
likelihood ratio
fX | Θ(x | θ1)
Λ(x ; θ0, θ1) =
fX | Θ(x | θ0)
is a monotonically increasing function of T (x).
If fX | Θ(x | θ) belongs to the MLR family, then use the following

test:
1, for T (x) ≥ λ,
φλ(x) =
0, for T (x) < λ

and set
α = PFA(φ(X), θ0) = PrX | Θ{T (X) ≥ λ | θ0} (27)
e.g. use this condition to find the threshold λ.
This test has the following properties:
(i) With α given by (27), φλ(x) is UMP test of size α for

testing
H0 : θ > θ0 versus
H1 : θ ≤ θ0.
(ii) For each λ, the power function
PrX | Θ{T (X) ≥ λ | θ} (28)
is a monotonically increasing function of θ.
Note: Consider the one-parameter exponential family
fX | Θ(x | θ) = h(x) exp[η(θ) T (x) − B(θ)]. (29)
Then, if η(θ) is a monotonically increasing function of θ, the

class of pdfs (pmfs) (29) satisfies the MLR conditions.

Example: Detection for Exponential Random
Variables
Consider conditionally i.i.d. measurements X[0], X[1], . . . , X[N −
1] given the parameter θ > 0, following the exponential pdf:
fX | Θ(x[n] | θ) = Expon(x[n] | 1/θ)

1
= exp(−θ−1 x[n]) i(0,+∞)(x[n]).
θ
The likelihood function of θ for all observations x =

[x[0], x[1], . . . , x[N − 1]]T is
N −1
1 Y
fX | Θ(x | θ) = N exp[−θ−1 T (x)] i(0,+∞)(x[n])
θ n=0
where
N
X −1
T (x) = x[n].
n=0
Since fX | Θ(x | θ) belongs to the one-parameter exponential
family (29) and η(θ) = −θ−1 is a monotonically increasing
function of θ. Therefore, the test

1, for T (x) ≥ λ,
φλ(x) =
0, for T (x) < λ

is UMP for testing
H0 : θ > θ0 versus
H1 : θ ≤ θ0.
The sum of i.i.d. exponential random variables follows the

Erlang pdf (which is a special case of the gamma pdf):
1 T N −1
fT | Θ(T | θ) = N
exp(−T /θ) i(0,+∞)(T )
θ (N − 1)!
= Gamma(T | N, θ−1).
Therefore, the size of the test can be written as
α = PrX | Θ{T (X) ≥ λ | θ0}

Z +∞
1 tN −1
= N
exp(−t/θ0) dt
θ0 λ (N − 1)!
h λ 1 λ N −1i
= 1 + + ··· + exp(−λ/θ0)
θ0 (N − 1)! θ0
where the integral is evaluated using integration by parts. For

N = 1, we have
λ = θ0 ln(1/α).

Generalized Likelihood Ratio (GLR) Test
Recall again that, in composite testing of two hypotheses, we

have spΘ(0) and spΘ(1) that form a partition of the parameter
space spΘ:
spΘ(0) ∪ spΘ(1) = spΘ, spΘ(0) ∩ spΘ(1) = ∅
and that we wish to identify which of the two hypotheses is

true:
H0 : θ ∈ spΘ(0) null hypothesis versus

H1 : θ ∈ spΘ(1) alternative hypothesis.
In GLR tests, we replace the unknown parameters by their

maximum-likelihood (ML) estimates under the two hypotheses.
Hence, accept H1 if
maxθ∈spΘ(1) fX | Θ(x | θ)
ΛGLR(x) = > τ.
This test has no UMP optimality properties, but often works

well in practice.

Example: Detecting a Completely Unknown
DC Level in AWGN
Consider again the composite hypothesis-testing problem from
p. 38:
H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus

H1 : θ 6= 0 i.e. θ ∈ spΘ(1) = (−∞, +∞)\{0}

conditionally i.i.d. given Θ = θ, following
N −1
1 h 1 X 2
i
fX | Θ(x | θ) = p · exp − (x[n] − θ)
(2 π σ 2)N 2 σ 2 n=0
and x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for θ

1
PN
is x = N n=1 x[n] and the pdf of x given Θ = θ is
fX | Θ(x | θ) = N (x | θ, σ 2/N ).
Our GLR test accepts H1 if
ΛGLR(x) = > τ.
fX | Θ(x | 0)
Now,
x = arg max fX | Θ(x | θ)
θ∈spΘ (1)

and
fX | Θ(x | 0) = N (x | 0, σ 2/N )
2
1 x
= p exp − 12 2
2 π σ 2/N σ /N
1
fX | Θ(x | x) = N (x | 0, σ 2/N ) = p
2 π σ 2/N
yielding
N x2
ln ΛGLR(x) = 2
.
2σ
Therefore, we accept H1 if
(x)2 > γ
or
|x| > η.
We compare this detector with the (not realizable, also called
clairvoyant) UMP detector that assumes the knowledge of the
sign of θ under H1. Assuming that the sign of θ under H1 is
known, we can construct the UMP detector, whose ROC curve
is given by
PD = Q(Q−1(PFA) − d)
p
where d = N θ2/σ 2 and θ is the value of the parameter
under H1; see (23) for the case where θ > 0 under H1. All
other detectors have PD below this upper bound.

GLR test: Decide H1 if |x| > η. To make sure that the GLR
test is implementable, we must be able to specify a threshold η
so that the false-alarm probability is upper-bounded by a given
size α. This is possible in our example:
PFA(φ(x), 0) = PrX | Θ{|X| > η | 0} see (26)

symmetry p
= 2 PrX | Θ{X > η | 0} = 2 Q(η/ σ 2/N )
PD(φ(x), θ) = PrX | Θ{|X| > η | θ} see (26)
= PrX | Θ{X > η | θ} + PrX | Θ{X < −η | θ}
η−θ η+θ
= Q p +Q p
2
σ /N σ 2/N
θ
= Q Q−1(α/2) − p
σ 2/N

−1 θ
+Q Q (α/2) + p .
2
σ /N
In this case, GLR test is only slightly worse than the clairvoyant
detector (Figure 6.4 in Kay-II):

Example: DC level in WGN with A and σ 2 both unknown.
Recall that σ 2 is called a nuisance parameter since we care
exclusively about θ. Here, the GLR test for
H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus

H1 : θ 6= 0 i.e. θ ∈ spΘ(1) = (−∞, +∞)\{0}
accepts H1 if
maxθ,σ2 fX | Θ,Σ2 (x | θ, σ 2)
ΛGLR(x) = >γ
maxσ2 fX | Θ,Σ2 (x | 0, σ 2)

where
N −1
1 1 X h i
fX | Θ,Σ2 (x | θ, σ 2) = p ·exp − 2 (x[n]−θ)2 .
(2 π σ 2)N 2 σ n=0
(30)
Here,
2 1 −N/2
max fX | Θ,Σ2 (x | θ, σ ) = · e
θ,σ 2 b12(x)]N/2
[2 π σ
1 −N/2
max fX | Θ,Σ2 (x | 0, σ 2) = · e
σ2 b02(x)]N/2
[2 π σ
where
N
1 X 2
b02(x) =
σ x [n]
N n=1
N
1 X
b12(x) =
σ (x[n] − x)2.
N n=1
Hence,
N/2
b02(x)

σ
ΛGLR(x) =
b12(x)
σ
i.e. GLR test fits data with the “best” DC-level signal θbML = x,
finds the residual variance estimate σ b12, and compares this
estimate with the variance estimate σ b02 under the null case (i.e.

b12 σ
for θ = 0). When sufficiently strong signal is present, σ b02
and ΛGLR(x) 1.
Note that
N
1 X
b12(x) =
σ (x − x[n])2
N n=1
N
1 X 2
= (x [n] − 2 x x[n] + x2)
N n=1
N
1 X
= x2[n] − 2 x2 + x2
N n=1
b02(x) − x2.
= σ
Hence,
b02(x)
σ 1
2 ln ΛGLR(x) = N ln 2 2 = N ln 2 2
.
b0 (x) − x
σ 1 − x /bσ0 (x)
Note that
x2
0≤ 2 ≤1
σb0 (x)
and ln[1/(1 − z)] is monotonically increasing on z ∈ (0, 1).
Therefore, an equivalent test can be constructed as follows:
x2
T (x) = 2 > τ.
σ
b0 (x)

The pdf of T (X) given θ = 0 does not depend on σ 2 and,
therefore, GLR test can be implemented, i.e. it is CFAR.
Definition. A test is constant false alarm rate (CFAR) if we

can find a threshold that yields a test whose size is equal to α.
In other words, we should be able to set the threshold

independently of the unknown parameters, i.e. the distribution
of the test statistic under H0 does not depend on the unknown
parameters.

l5c 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

l5c 2

Uploaded by

Copyright:

Available Formats

Outline:

• Neyman-Pearson test for simple binary hypotheses, receiver

• An introduction to classical composite hypothesis testing.

• (part of) Chapter 5 in Levy.

EE 527, Detection and Estimation Theory, # 5c 1

In binary hypothesis testing, we wish to identify which

H0 : θ ∈ spΘ(0) null hypothesis versus

spΘ(0) ∪ spΘ(1) = spΘ, spΘ(0) ∩ spΘ(1) = ∅.

which partitions the data space X [i.e. the support of

X0 = {x : φ(x) = 0} and X1 = {x : φ(x) = 1}.

EE 527, Detection and Estimation Theory, # 5c 2

PM(φ(X), θ) = E X | Θ[1 − φ(X) | θ]

and the probability of detection (correctly deciding H1):

For simple hypotheses, spΘ(0) = {θ0}, spΘ(1) = {θ1}, and

EE 527, Detection and Estimation Theory, # 5c 3

= PrX | Θ{test statistic (ts) > τ | θ0} (3)

= PrX | Θ{ts > τ} | θ1}.

(i) As the region X1 shrinks (i.e. τ % ∞), both of the above

EE 527, Detection and Estimation Theory, # 5c 4

(ii) As the region X1 grows (i.e. τ & 0), both probabilities

(iii) Observations (i) and (ii) do not imply equality between

(v) Thus, to increase the detection probability PD, we must

Receiver Operating Characteristic (ROC) allows us to visualize

EE 527, Detection and Estimation Theory, # 5c 5

EE 527, Detection and Estimation Theory, # 5c 6

Bayesian tests are criticized because they require specification

An alternative classical solution for simple hypotheses is

Select the decision rule φ(X) that maximizes

• Simple hypothesis testing:

• Parametric data models fX | Θ(x | θ0), fX | Θ(x | θ1).

• No prior pdf/pmf on Θ is available.

EE 527, Detection and Estimation Theory, # 5c 7

see also (3).

A Neyman-Pearson test φNP(x) solves the constrained

φNP(x) = arg max PD(φ(x), θ1).

We apply Lagrange multipliers to solve this optimization

L(φ(x), λ) = PD(φ(x), θ1) + λ [α − PFA(φ(x), θ0)]

with λ ≥ 0. A decision rule φ(x) will be optimal if it

λ [α − PFA(φ(x), θ0)] = 0. (5)

Upon using (3) and (4), the Lagrangian can be written as

EE 527, Detection and Estimation Theory, # 5c 8

• a λ such that the KKT condition (5) holds and

• an allocation rule for those x that satisfy Λ(x) = λ.

Now, consider two versions of (6) for a fixed threshold λ:

EE 527, Detection and Estimation Theory, # 5c 9

FΛ | Θ(l | θ0) = PrΛ | Θ{Λ ≤ l | θ0}.

EE 527, Detection and Estimation Theory, # 5c 10

In this case, KKT condition (5) holds and, therefore, the

PFA(φL,0(x), θ0) = 1 − f0 < α see (7).

An example of this case corresponds to λ1 = 0 and 1 − α1

(ii) Suppose that

and there exists a λ such that

FΛ | Θ(λ | θ0) = 1 − α. (10)

Then, by selecting this λ as the threshold and using

EE 527, Detection and Estimation Theory, # 5c 11

PFA(φL,λ(x), θ0) = 1 − FΛ | Θ(λ | θ0) = α see (9)

(iii) Suppose that

as in (ii), but cdf FΛ | Θ(l | θ0) has a discontinuity point

FΛ | Θ(λ− | θ0) < 1 − α < FΛ | Θ(λ+ | θ0)

Suppose that we are not allowed to change α; this gives us

EE 527, Detection and Estimation Theory, # 5c 12

PFA(φL,λ(x), θ0) = 1 − FΛ | Θ(λ+ | θ0) < α,

• φU(x, λ) has false-alarm probability