You are on page 1of 50

Outline:

• Neyman-Pearson test for simple binary hypotheses, receiver


operating characteristic (ROC).

• An introduction to classical composite hypothesis testing.

Reading:

• Chapter 3 in Kay-II,

• (part of) Chapter 5 in Levy.

EE 527, Detection and Estimation Theory, # 5c 1


False-alarm and Detection Probabilities for
Binary Hypothesis Tests: A Reminder
(see handout # 5)

In binary hypothesis testing, we wish to identify which


hypothesis is true (i.e. make the appropriate decision):

H0 : θ ∈ spΘ(0) null hypothesis versus


H1 : θ ∈ spΘ(1) alternative hypothesis

where

spΘ(0) ∪ spΘ(1) = spΘ, spΘ(0) ∩ spΘ(1) = ∅.

X
Recall that a binary decision rule φ(x) maps |{z} to {0, 1}:
data space


1, decide H1,
φ(x) =
0, decide H0.

which partitions the data space X [i.e. the support of


fX | Θ(x | θ)] into two regions:

X0 = {x : φ(x) = 0} and X1 = {x : φ(x) = 1}.

EE 527, Detection and Estimation Theory, # 5c 2


Recall the probabilities of false alarm and miss:

PFA(φ(X), θ) = E X | Θ[φ(X) | θ]
Z
= fX | Θ(x | θ) dx for θ ∈ spΘ (0) (1)
X1

PM(φ(X), θ) = E X | Θ[1 − φ(X) | θ]


Z
= 1− fX | Θ(x | θ) dx
| X1 {z }
PD (φ(X ),θ)
Z
= fX | Θ(x | θ) dx for θ in spΘ (1) (2)
X0

and the probability of detection (correctly deciding H1):


Z
PD(φ(X), θ) = E X | Θ[φ(X) | θ] = fX | Θ(x | θ) dx for θ in spΘ(1).
X1

For simple hypotheses, spΘ(0) = {θ0}, spΘ(1) = {θ1}, and


spΘ = {θ0, θ1}, the above expressions simplify, as shown in the
following.

EE 527, Detection and Estimation Theory, # 5c 3


Probabilities of False Alarm (P FA) and
Detection (P D) for Simple Hypotheses

Z
PFA(φ(X), θ0) = fX | Θ(x | θ0) dx
X1

= PrX | Θ{test statistic (ts) > τ | θ0} (3)


| {z }
X ∈X1
Z
PD(φ(X), θ1) = fX | Θ(x | θ1) dx
X1

= PrX | Θ{ts > τ} | θ1}.


| {z (4)
X ∈X1

Comments:

(i) As the region X1 shrinks (i.e. τ % ∞), both of the above

EE 527, Detection and Estimation Theory, # 5c 4


probabilities shrink towards zero.

(ii) As the region X1 grows (i.e. τ & 0), both probabilities


grow towards unity.

(iii) Observations (i) and (ii) do not imply equality between


PFA and PD; in most cases, as X1 grows, PD grows more
rapidly than PFA (i.e. we better be right more often than we
are wrong).

(iv) However, the perfect case where our rule is always right
and never wrong (PD = 1 and PFA = 0) cannot occur when
the conditional pdfs/pmfs fX | Θ(x | θ0) and fX | Θ(x | θ1)
overlap.

(v) Thus, to increase the detection probability PD, we must


also allow for the false-alarm probability PFA to increase.
This behavior
• represents the fundamental tradeoff in hypothesis testing
and detection theory and
• motivates us to introduce a (classical) approach to testing
simple hypotheses, pioneered by Neyman and Pearson, to
be discussed next.

Receiver Operating Characteristic (ROC) allows us to visualize


the realm of achievable PFA(φ(X), θ0) and PD(φ(X), θ1).

EE 527, Detection and Estimation Theory, # 5c 5


A point (PFA, PD) is in the shaded region if we can find a rule
φ(X) such that PFA(φ(X), θ0) = PFA and PD(φ(X), θ1) = PD.

EE 527, Detection and Estimation Theory, # 5c 6


Neyman-Pearson Test for Simple Hypotheses

Bayesian tests are criticized because they require specification


of prior distribution (pmf or, in the composite-testing case,
pdf) and the cost-function parameters L(i | j).

An alternative classical solution for simple hypotheses is


developed by Neyman and Pearson.

Select the decision rule φ(X) that maximizes


PD(φ(X), θ1) while ensuring that the probability of false alarm
PFA(φ(X), θ0) is less than or equal to a specified level α.

Setup:

• Simple hypothesis testing:

H0 : θ = θ0 versus
H1 : θ = θ1.

• Parametric data models fX | Θ(x | θ0), fX | Θ(x | θ1).

• No prior pdf/pmf on Θ is available.

EE 527, Detection and Estimation Theory, # 5c 7


• Define the set of all rules φ(X) whose probability of false
alarm is less than or equal to a specified level α:

Dα = φ(X) PFA(φ(X), θ0) ≤ α}

see also (3).

A Neyman-Pearson test φNP(x) solves the constrained


optimization problem:

φNP(x) = arg max PD(φ(x), θ1).


φ(x)∈Dα

We apply Lagrange multipliers to solve this optimization


problem; consider the Lagrangian:

L(φ(x), λ) = PD(φ(x), θ1) + λ [α − PFA(φ(x), θ0)]

with λ ≥ 0. A decision rule φ(x) will be optimal if it


maximizes L(φ(x), λ) and satisfies the Karush-Kuhn-Tucker
(KKT) condition:

λ [α − PFA(φ(x), θ0)] = 0. (5)

Upon using (3) and (4), the Lagrangian can be written as


Z
L(φ(x), λ) = λ α + [fX | Θ(x | θ1) − λ fX | Θ(x | θ0)] dx.
X1

EE 527, Detection and Estimation Theory, # 5c 8


Consider maximizing L(φ(x), λ) with respect to φ(x) for a
given λ. Then, φ(x) needs to satisfy

1,  Λ(x) > λ
φλ(x) = 0 or 1, Λ(x) = λ (6)
0, Λ(x) < λ

where
fX | Θ(x | θ1)
Λ(x) =
fX | Θ(x | θ0)
is the likelihood ratio. The values x that satisfy Λ(x) = λ
can be allocated to either X1 or X0. To completely specify the
optimal test, we need to select

• a λ such that the KKT condition (5) holds and

• an allocation rule for those x that satisfy Λ(x) = λ.

Now, consider two versions of (6) for a fixed threshold λ:



 1, Λ(x) > λ
φU,λ(x) = 1, Λ(x) = λ
0, Λ(x) < λ

and 
 1, Λ(x) > λ
φL,λ(x) = 0, Λ(x) = λ .
0, Λ(x) < λ

EE 527, Detection and Estimation Theory, # 5c 9


In the first case, all observations x for which Λ(x) = λ are
allocated to X1; in the second case, these observations are
allocated to X0.
Consider the cumulative distribution function (cdf) of Λ(X) =
Λ under H0:

FΛ | Θ(l | θ0) = PrΛ | Θ{Λ ≤ l | θ0}.

Define
f0 = FΛ | Θ(0 | θ0) = PrΛ | Θ{Λ ≤ 0 | θ0}.
Recall that cdf FΛ | Θ(l | θ0) must be nondecreasing and right-
continuous, but may have discontinuities.

EE 527, Detection and Estimation Theory, # 5c 10


Consider three cases, depending on α:

(i) When
1 − α < f0 i.e. 1 − f0 < α (7)
we select the threshold λ = 0 and apply the rule

1, Λ(x) > 0
φL,0(x) = . (8)
0, Λ(x) = 0

In this case, KKT condition (5) holds and, therefore, the


test (8) is optimal; its probability of false alarm is

PFA(φL,0(x), θ0) = 1 − f0 < α see (7).

An example of this case corresponds to λ1 = 0 and 1 − α1


in the above figure.

(ii) Suppose that

1 − α ≥ f0 i.e. 1 − f0 ≥ α (9)

and there exists a λ such that

FΛ | Θ(λ | θ0) = 1 − α. (10)

Then, by selecting this λ as the threshold and using



1, Λ(x) > λ
φL,λ(x) = (11)
0, Λ(x) ≤ λ

EE 527, Detection and Estimation Theory, # 5c 11


we obtain a test with false-alarm probability

PFA(φL,λ(x), θ0) = 1 − FΛ | Θ(λ | θ0) = α see (9)

the KKT condition (5) holds, and the test (10) is optimal.
An example of this case corresponds to λ2 and 1 − α2 in the
above figure.

(iii) Suppose that

1 − α ≥ f0 i.e. 1 − f0 ≥ α

as in (ii), but cdf FΛ | Θ(l | θ0) has a discontinuity point


λ > 0 such that

FΛ | Θ(λ− | θ0) < 1 − α < FΛ | Θ(λ+ | θ0)

where FΛ | Θ(λ− | θ0) and FΛ | Θ(λ+ | θ0) denote the left and
right limits of FΛ | Θ(λ | θ0) at l = λ. If this case happens
in practice, we can try to avoid the problem by changing
our specified α, which is anyway not God-given, but
chosen rather arbitrarily. We should pick a value of α
that satisfies the KKT condition.

Suppose that we are not allowed to change α; this gives us


a chance to practice some basic probability. First, note that

EE 527, Detection and Estimation Theory, # 5c 12


• φL,λ(x) has false-alarm probability

PFA(φL,λ(x), θ0) = 1 − FΛ | Θ(λ+ | θ0) < α,

• φU(x, λ) has false-alarm probability

PFA{φU,λ(x), θ0} = 1 − FΛ | Θ(λ− | θ0) > α

and KKT optimality condition (5) requires that


PFA(φλ(x), θ0) = α. We focus on the tests of the form
(6) and construct the optimal test via randomization.

Define the probability

α − PFA(φL,λ(x), θ0)
p =
PFA(φU,λ(x), θ0) − PFA(φL,λ(x), θ0)

which clearly satisfies 0 < p < 1.

EE 527, Detection and Estimation Theory, # 5c 13


Select φU,λ(x) with probability p and φL,λ(x) with
probability 1 − p. This test indeed has the form (6); its
probability of false alarm is

PFA(φλ(x), θ0)
= PFA(φL,λ(x), θ0) + p [PFA(φU,λ(x), θ0) − PFA(φL,λ(x), θ0)] = α.

Since KKT condition (5) is satisfied, the randomized test



 1, Λ(x) > λ
φλ(x) = 1 w.p. p and 0 w.p. 1 − p, Λ(x) = λ
0, Λ(x) < λ

is optimal.

EE 527, Detection and Estimation Theory, # 5c 14


ROC Properties when Likelihood Ratio is a
Continuous Random Variable Given θ

Based on the Neyman-Pearson theory, if we set PFA = α, then


the test that maximizes PD must be a likelihood-ratio test of
the form (6). Thus, the ROC curve separating achievable and
non-achievable pairs (PFA, PD) corresponds to the family of
likelihood-ratio tests.

For simplicity, we focus here on the case where the likelihood


ratio is a continuous random variable given θ. First, note that,

EE 527, Detection and Estimation Theory, # 5c 15


for the likelihood-ratio test,
Z
PFA(τ ) = fX | Θ(x | θ0) dx
X1
Z +∞
= PrX | Θ{Λ(X) > τ | θ0} = fΛ | Θ(l | θ0) dl (12)
τ
Z
PD(τ ) = fX | Θ(x | θ1) dx
X1
Z +∞
= PrX | Θ{Λ(X) > τ | θ1} = fΛ | Θ(l | θ1) dl (13)
τ

where τ denotes the threshold. Under the continuity


assumption for the likelihood ratio, as we vary τ between
0 and +∞, the point PFA(φ(X), θ0), PD(φ(X), θ1) moves
continuously along the ROC curve. If we set τ = 0, we always
select H1 and, therefore,

PFA(0) = PD(0) = 1.

Conversely, if we set τ = +∞, we always select H0 and,


therefore,
PFA(+∞) = PD(+∞) = 0.
In summary,
ROC Property 1. If the likelihood ratio is a continuous
random variable given θ, the points (0, 0) and (1, 1) belong to
ROC.

EE 527, Detection and Estimation Theory, # 5c 16


Now, differentiate (12) and (13) with respect to τ :

dPD(τ )
= −fΛ | Θ(τ | θ1)

dPD(τ )
= −fΛ | Θ(τ | θ0)

implying
dPD(τ ) fΛ | Θ(τ | θ1)
= = τ.
dPFA(τ ) fΛ | Θ(τ | θ0)
In summary,
ROC Property 2. If the likelihood ratio is a continuous
random variable given θ, the slope of ROC at point
(PFA(τ ), PD(τ )) is equal to the threshold τ of the corresponding
likelihood-ratio test.
In particular, this result implies that the slope of ROC is

• τ = +∞ at (0, 0) and

• τ = 0 at (1, 1).

ROC Property 3. The domain of achievable pairs (PFA, PD)


is convex and the ROC curve is concave. This property holds
in general, including the case where the likelihood ratio is a
mixed or discrete random variable given θ.

HW: Prove ROC Property 3.

EE 527, Detection and Estimation Theory, # 5c 17


ROC Property 4. All points on ROC curve satisfy

PD(τ ) ≥ PFA(τ ).

This property holds in general, including the case where the


likelihood ratio is a mixed or discrete random variable given θ.

EE 527, Detection and Estimation Theory, # 5c 18


Example: Simple Hypotheses,
Coherent Detection in Gaussian Noise with
Known Covariance Matrix

Simple hypotheses: the space of the parameter µ and its


partitions are

spµ = {µ0, µ1}, spµ(0) = {µ0}, spµ(1) = {µ1}.

The measurement vector X given µ is modeled using

fX | µ(x | µ) = N (x | µ, C)
1
= p exp[− 21 (x − µ)T C −1 (x − µ)]
|2 π C|

where C is a known positive definite covariance matrix. Our


likelihood-ratio test is

fX | µ(x | µ1)
Λ(x) =
| {z } fX | µ(x | µ0)
likelihood ratio

exp[− 21 (x − µ1)T C −1 (x − µ1)] H1


= 1 T −1
≷ τ.
exp[− 2 (x − µ0) C (x − µ0)

EE 527, Detection and Estimation Theory, # 5c 19


Therefore,

H1
T −1 T −1
− 12 (x − µ1) C (x − µ1) + 21 (x − µ0) C (x − µ0) ≷ ln τ

i.e.
H1
T −1 1
(µ1 − µ0) C [x − (µ0 + µ1)] ≷ ln τ.
2

and, finally,

H1 4
−1
T (x) = s C T
x ≷ ln τ + 12 (µ1 − µ0)T C −1 (µ1 + µ0) = γ

where we have defined

4
s = µ1 − µ0.

False-alarm and detection/miss probabilities. Given µ,


T (x) is a linear combination of Gaussian random variables,
implying that it is also Gaussian, with mean and variance:

E X | µ[T (X) | µ] = sT C −1µ


varX | µ[T (X) | µ] = sT C −1s (not a function of µ).

EE 527, Detection and Estimation Theory, # 5c 20


Now,

PFA = PrX | µ{T (X) > γ | µ0}


standard normal random variable
z }| {
n T (X) − sT C −1µ γ − s T
C −1
µ o
0 0
= PrX | µ √ > √ µ0
T
s C s −1 T
s C s−1
!
T −1
γ − s C µ0
= Q √ (14)
T
s C s−1

and

PD = 1 − PM = PrX | µ{T (X) > γ | µ1}


standard normal random variable
z }| {
n T (X) − sT C −1µ γ − s T
C −1
µ o
1 1
= PrX | µ √ > √ | µ1
T
s C s −1 T
s C s −1
!
γ − sT C −1µ1
= Q √ .
T
s C s−1

We use (14) to obtain a γ that satisfies the specified PFA:

T −1
γ −1 s C µ0
√ = Q (PFA) + √
T −1
s C s sT C −1 s
EE 527, Detection and Estimation Theory, # 5c 21
implying
 √ 
PD = Q Q−1(PFA) − sT C −1 s
 
= Q Q−1(PFA) − d (15)

Here,
√ q
d= sT C −1 s = (µ1 − µ0)T C −1 (µ1 − µ0)

is the deflection coefficient.

EE 527, Detection and Estimation Theory, # 5c 22


Decentralized Detection for Simple Hypotheses

Consider a decentralized detection scenario depicted by

Assumptions:

• The observations X[n], n = 0, 1, . . . , N − 1 made at N


spatially distributed sensors (nodes) follow the same marginal
probabilistic model:

fX | Θ(x[n] | θ) (16)

and are conditionally independent given Θ = θ, which may


not always be reasonable, but leads to an easy solution.

EE 527, Detection and Estimation Theory, # 5c 23


• We wish to test:

H0 : θ = θ0 versus
H1 : θ = θ1.

• Each node n makes a hard local decision d[n] based on


its local observation x[n] and sends it to the headquarters
(fusion center), which collects all the local decisions and
makes the final global decision H0 versus H1. This structure
is clearly suboptimal: it is easy to construct a better
decision strategy in which each node sends its (quantized,
in practice) likelihood ratio to the fusion center, rather than
the decision only. However, such a strategy would have a
higher communication (energy) cost.
The false-alarm and detection probabilities of each node’s
local decision rules can be computed using (16). Suppose
that we have obtained them for each n:

PFA,n, PD,n, n = 0, 1, . . . , N − 1.

We now discuss the decentralized detection problem. Note that

pD(n) | Θ(dn | θ1) = PDd,n


n
(1 − PD,n)1−dn
| {z }
Bernoulli pmf

EE 527, Detection and Estimation Theory, # 5c 24


and, similarly,

dn 1−dn
pD(n) | Θ(dn | θ0) = PFA ,n (1 − PFA,n )
| {z }
Bernoulli pmf

where PFA,n is the nth sensor’s local detection false-alarm


probability. Now,

N
X (dn | θ1) i
hp
D(n) | Θ
ln Λ(d) = ln
n=1
pD(n) | Θ(dn | θ0)
Nh P dn (1 − P )1−dn i H1
D,n D,n
X
= ln dn ≷ ln τ.
PFA,n (1 − PFA,n)1−dn
n=1

To be able to further simplify the above expression, we nof


focus on the case where all sensors have identical performance:

PD,n = PD, PFA,n = PFA

i.e. all local decision thresholds at the nodes are identical.


Define the number of sensors deciding locally to support H1:

N
X −1
u1 = d[n].
n=0

EE 527, Detection and Estimation Theory, # 5c 25


Then, the log-likelihood ratio becomes

P   1 − P  H1
D D
log Λ(d) = u1 log + (N − u1) log ≷ log τ
PFA 1 − PFA

or
h P · (1 − P ) i H1 1 − P 
D FA FA
u1 log ≷ log τ + N log . (17)
PFA · (1 − PD) 1 − PD

Clearly, each node’s local decision dn is meaningful only if


PD > PFA, which implies

PD · (1 − PFA)
>1
PFA · (1 − PD)

the logarithm of which is therefore positive, and the decision


rule (17) further simplifies to

H1
u1 ≷ τ 0 .

The Neyman-Person performance analysis of this detector


is easy: the random variable U1 is binomial given θ (i.e.
conditional on the hypothesis) and, therefore,
 
N
PrU1 | Θ{U1 = u1 | θ} = pu1 (1 − p)N −u1
u1

EE 527, Detection and Estimation Theory, # 5c 26


where p = PFA under H0 and p = PD under H1. Hence, the
“global” false-alarm probability is

PFA,global = PrU1 | Θ{U1 > τ 0 | θ0}


N  
X N u1
= · PFA · (1 − PFA)N −u1 .
0
u1
u1 =dτ e

EE 527, Detection and Estimation Theory, # 5c 27


An Introduction to Classical Composite
Hypothesis Testing

First, recall that, in composite testing of two hypotheses, we


have spΘ(0) and spΘ(1) that form a partition of the parameter
space spΘ:

spΘ(0) ∪ spΘ(1) = spΘ, spΘ(0) ∩ spΘ(1) = ∅

and that we wish to identify which of the two hypotheses is


true:

H0 : Θ ∈ spΘ(0) null hypothesis versus


H1 : Θ ∈ spΘ(1) alternative hypothesis.

Here, we adopt the classical Neyman-Pearson approach: given


an upper bound α on the false-alarm probability, maximize the
detection probability.

The fact that H0 is composite means that the false-alarm


probability for a rule φ(X) is a function of θ:

PFA(φ(X), θ))

where θ ∈ spΘ(0). Therefore, to satisfy the upper bound α, we

EE 527, Detection and Estimation Theory, # 5c 28


consider all tests φ(X) such that

max PFA(φ(X), θ)) ≤ α. (18)


θ∈spΘ (0)

In this context,
max PFA(φ(X), θ) (19)
θ∈spΘ (0)
is typically referred to as the size of the test φ(X). Therefore,
the condition (18) states that we focus on tests whose size is
upper-bounded by α.

Definition. Among all tests φ(X) whose size is upper-


bounded by α [i.e. (18) holds], we say that φUMP(X) is a
uniformly most powerful (UMP) test if it satisfies

PD(φUMP(X), θ) ≥ PD(φ(X), θ)

for all θ ∈ spΘ(1).

This is a very strong statement and very few hypothesis-testing


problems have UMP tests. Note that Neyman-Pearson tests
for simple hypotheses are UMP.
Hence, to find an UMP test for composite hypotheses, we
need to first write a likelihood ratio for the simple hypothesis
test with spΘ(0) = {θ0}, spΘ(1) = {θ1}, and spΘ = {θ0, θ1}
and then transform this likelihood ratio in such a way that
unknown quantities (e.g. θ0 and θ1) disappear from the test
statistic.

EE 527, Detection and Estimation Theory, # 5c 29


(1) If such a transformation can be found, there is hope that
a UMP test exists.

(2) However, we still need to figure out how to set a decision


threshold (τ , say) such that the upper bound (18) is satisfied.

EE 527, Detection and Estimation Theory, # 5c 30


Example 1: Detecting a Positive DC Level in
AWGN (versus zero DC level)

Consider the following composite hypothesis-testing problem:

H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus


H1 : θ>0 i.e. θ ∈ spΘ(1) = (0, +∞)

where the measurements X[0], X[1], . . . , X[N − 1] are


conditionally independent, identically distributed (i.i.d.) given
Θ = θ, modeled as

{X[n] | Θ = θ} = θ + W [n] n = 0, 1, . . . , N − 1

with W [n] a zero-mean white Gaussian noise with known


variance σ 2, i.e.
W [n] ∼ N (0, σ 2)
implying
N −1
1 1 X h i
fX | Θ(x | θ) = p ·exp − 2 (x[n]−θ)2 (20)
(2 π σ 2)N 2 σ n=0

where x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for


θ is
N
1 X
x= x[n].
N n=1

EE 527, Detection and Estimation Theory, # 5c 31


Now, find the pdf of x given Θ = θ:

fX | Θ(x | θ) = N (x | θ, σ 2/N ). (21)

We start by writing the classical Neyman-Pearson test for the


simple hypotheses with spsimple
Θ (0) = {0} and sp simple
Θ (1) =
{θ1}, θ1 ∈ (0, +∞):
−1/2
2
fX | Θ(x | θ1) (2 π σ /N ) · exp[− 2 σ21/N (x − θ1)2] H1
= −1/2 1 ≷ λ.
fX | Θ(x | 0) 2
(2 π σ /N ) 2
· exp[− 2 σ2/N (x) ]

Taking log etc. leads to


H1
θ1 x ≷ η.

Since we know that θ1 > 0, we can divide both sides of the


above expression by θ1 and accept H1 if
H1
φ(x) : x ≷ τ.

Hence, we transformed our likelihood ratio in such a way that


θ1 disappears from the test statistic, i.e. we accomplished (1)
above.
Now, on to (2). How to determine the threshold τ such that
the upper bound (18) is satisfied? Based on (25), we know:

fX | Θ(x | 0) = N (x | 0, σ 2/N )

EE 527, Detection and Estimation Theory, # 5c 32


and, therefore,

PFA(φ(X), 0) = PrX | Θ{X > τ 0}
n X −0 τ o
= PrX | Θ p >p 0
σ 2/N σ 2/N
| {z }
standard
normal
random var.
 τ 
= Q p .
2
σ /N

Note that
 τ 
max PFA(φ(X), θ) = PFA(φ(X), 0) = Q p =α
θ∈spΘ (0) 2
σ /N

see (18) and (19). The most powerful test is achieved if the
upper bound α in (18) is reached by equality:
r
σ2
τ= · Q−1(α). (22)
N

Hence, we have accomplished (2), since this τ yields exactly


size α for our test φ(X).

To study the performance of the above test, we substitute

EE 527, Detection and Estimation Theory, # 5c 33


(22) into the power function:

n X −θ τ − θ o
PrX | Θ{X > τ | θ} = PrX | Θ p >p θ
2
σ /N 2
σ /N
| {z }
standard
normal
random var.
 τ −θ   θ 
= Q p = Q Q−1(α) − p . (23)
2
σ /N 2
σ /N

EE 527, Detection and Estimation Theory, # 5c 34


Example 2: Detecting a Positive DC Level in
AWGN (versus nonnegative DC level)

Consider the following composite hypothesis-testing problem:

H0 : θ≤0 i.e. θ ∈ spΘ(0) = (−∞, 0] versus


H1 : θ>0 i.e. θ ∈ spΘ(1) = (0, +∞)

where the measurements X[0], X[1], . . . , X[N − 1] are


conditionally i.i.d. given Θ = θ, modeled as

{X[n] | Θ = θ} = θ + W [n] n = 0, 1, . . . , N − 1

with W [n] a zero-mean white Gaussian noise with known


variance σ 2, i.e.
W [n] ∼ N (0, σ 2)
implying

N −1
1 1 X h
2
i
fX | Θ(x | θ) = p ·exp − 2 (x[n]−θ) (24)
2
(2 π σ )N 2 σ n=0

where x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for


θ is
N
1 X
x= x[n].
N n=1

EE 527, Detection and Estimation Theory, # 5c 35


and
fX | Θ(x | θ) = N (x | θ, σ 2/N ). (25)
We start by writing the classical Neyman-Pearson test for the
simple hypotheses with spsimple
Θ (0) = {θ 0 } and sp simple
Θ (1) =
{θ1}, where θ0 ∈ (−∞, 0] and θ1 ∈ (0, +∞), implying
−1/2
2
fX | Θ(x | θ1) (2 π σ /N ) · exp[− 2 σ21/N (x − θ1)2] H1
= ≷ λ
fX | Θ(x | θ0) (2 π σ 2/N )−1/2 · exp[− 2 σ21/N (x − θ0)2]

and
θ0 < θ1.
Taking log etc. leads to
H1
(θ1 − θ0) x ≷ η

and, since θ0 < θ1, to


H1
φ(x) : x ≷ τ.

Hence, we transformed our likelihood ratio in such a way that


θ0 and θ1 disappear from the test statistic, i.e. we accomplished
(1) above.
The power function of this test is
nX −θ τ − θ o  τ −θ 
PrX | Θ{X > τ | θ} = PrX | Θ √ > √ θ = Q √
σ/ N σ/ N σ/ N

EE 527, Detection and Estimation Theory, # 5c 36


which is an increasing function of θ. Recall the definition (19)
of test size:

max PFA(φ(X), θ) = max PrX | Θ{X > τ | θ}


θ∈spΘ (0) θ∈spΘ (0)
 τ −θ   τ 
= max Q √ =Q √ .
θ∈(−∞,0] σ/ N σ/ N

The most powerful test is achieved if the upper bound α in


(18) is reached by equality:

σ
τ = √ Q−1(α).
N

Hence, we have accomplished (2), since this τ yields exactly


size α for our test φ(X).

EE 527, Detection and Estimation Theory, # 5c 37


Example 3: Detecting a Completely Unknown
DC Level in AWGN
Consider now the composite hypothesis-testing problem:

H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus


H1 : θ 6= 0 i.e. θ ∈ spΘ(1) = (−∞, +∞)\{0}

where the measurements X[0], X[1], . . . , X[N − 1] are


conditionally i.i.d. given Θ = θ, following
N −1
1 h 1 X 2
i
fX | Θ(x | θ) = p · exp − 2
(x[n] − θ)
(2 π σ 2)N 2 σ n=0

and x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for θ


1
PN
is x = N n=1 x[n] and the pdf of x given Θ = θ is

fX | Θ(x | θ) = N (x | θ, σ 2/N ). (26)

We start by writing the classical Neyman-Pearson test for the


simple hypotheses with spΘ(0) = {0} and spΘ(1) = {θ1 6= 0}:

θ1 x > η.

We cannot accomplish (1), since θ1 cannot be removed from


the test statistic; therefore, UMP test does not exist for the
above problem.

EE 527, Detection and Estimation Theory, # 5c 38


Monotone Likelihood-ratio Criterion

Consider a scalar parameter θ. We say that

fX | Θ(x | θ)

belongs to the monotone likelihood ratio (MLR) family if the


pdfs (or pmfs) from this family

• satisfy the identifiability condition for θ (i.e. these pdfs are


distinct for different values of θ) and

• there is a scalar statistic T (x) such that, for θ0 < θ1, the
likelihood ratio

fX | Θ(x | θ1)
Λ(x ; θ0, θ1) =
fX | Θ(x | θ0)

is a monotonically increasing function of T (x).

If fX | Θ(x | θ) belongs to the MLR family, then use the following


test: 
1, for T (x) ≥ λ,
φλ(x) =
0, for T (x) < λ

EE 527, Detection and Estimation Theory, # 5c 39


and set

α = PFA(φ(X), θ0) = PrX | Θ{T (X) ≥ λ | θ0} (27)

e.g. use this condition to find the threshold λ.

This test has the following properties:

(i) With α given by (27), φλ(x) is UMP test of size α for


testing

H0 : θ > θ0 versus
H1 : θ ≤ θ0.

(ii) For each λ, the power function

PrX | Θ{T (X) ≥ λ | θ} (28)

is a monotonically increasing function of θ.

Note: Consider the one-parameter exponential family

fX | Θ(x | θ) = h(x) exp[η(θ) T (x) − B(θ)]. (29)

Then, if η(θ) is a monotonically increasing function of θ, the


class of pdfs (pmfs) (29) satisfies the MLR conditions.

EE 527, Detection and Estimation Theory, # 5c 40


Example: Detection for Exponential Random
Variables
Consider conditionally i.i.d. measurements X[0], X[1], . . . , X[N −
1] given the parameter θ > 0, following the exponential pdf:

fX | Θ(x[n] | θ) = Expon(x[n] | 1/θ)


1
= exp(−θ−1 x[n]) i(0,+∞)(x[n]).
θ

The likelihood function of θ for all observations x =


[x[0], x[1], . . . , x[N − 1]]T is

N −1
1 Y
fX | Θ(x | θ) = N exp[−θ−1 T (x)] i(0,+∞)(x[n])
θ n=0

where
N
X −1
T (x) = x[n].
n=0
Since fX | Θ(x | θ) belongs to the one-parameter exponential
family (29) and η(θ) = −θ−1 is a monotonically increasing
function of θ. Therefore, the test

1, for T (x) ≥ λ,
φλ(x) =
0, for T (x) < λ

EE 527, Detection and Estimation Theory, # 5c 41


is UMP for testing

H0 : θ > θ0 versus
H1 : θ ≤ θ0.

The sum of i.i.d. exponential random variables follows the


Erlang pdf (which is a special case of the gamma pdf):

1 T N −1
fT | Θ(T | θ) = N
exp(−T /θ) i(0,+∞)(T )
θ (N − 1)!
= Gamma(T | N, θ−1).

Therefore, the size of the test can be written as

α = PrX | Θ{T (X) ≥ λ | θ0}


Z +∞
1 tN −1
= N
exp(−t/θ0) dt
θ0 λ (N − 1)!
h λ 1 λ N −1i
= 1 + + ··· + exp(−λ/θ0)
θ0 (N − 1)! θ0

where the integral is evaluated using integration by parts. For


N = 1, we have
λ = θ0 ln(1/α).

EE 527, Detection and Estimation Theory, # 5c 42


Generalized Likelihood Ratio (GLR) Test

Recall again that, in composite testing of two hypotheses, we


have spΘ(0) and spΘ(1) that form a partition of the parameter
space spΘ:

spΘ(0) ∪ spΘ(1) = spΘ, spΘ(0) ∩ spΘ(1) = ∅

and that we wish to identify which of the two hypotheses is


true:

H0 : θ ∈ spΘ(0) null hypothesis versus


H1 : θ ∈ spΘ(1) alternative hypothesis.

In GLR tests, we replace the unknown parameters by their


maximum-likelihood (ML) estimates under the two hypotheses.
Hence, accept H1 if

maxθ∈spΘ(1) fX | Θ(x | θ)
ΛGLR(x) = > τ.
maxθ∈spΘ(0) fX | Θ(x | θ)

This test has no UMP optimality properties, but often works


well in practice.

EE 527, Detection and Estimation Theory, # 5c 43


Example: Detecting a Completely Unknown
DC Level in AWGN
Consider again the composite hypothesis-testing problem from
p. 38:

H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus


H1 : θ 6= 0 i.e. θ ∈ spΘ(1) = (−∞, +∞)\{0}

where the measurements X[0], X[1], . . . , X[N − 1] are


conditionally i.i.d. given Θ = θ, following
N −1
1 h 1 X 2
i
fX | Θ(x | θ) = p · exp − (x[n] − θ)
(2 π σ 2)N 2 σ 2 n=0

and x = [x[0], x[1], . . . , x[N − 1]]T . A sufficient statistic for θ


1
PN
is x = N n=1 x[n] and the pdf of x given Θ = θ is

fX | Θ(x | θ) = N (x | θ, σ 2/N ).

Our GLR test accepts H1 if

maxθ∈spΘ(1) fX | Θ(x | θ)
ΛGLR(x) = > τ.
fX | Θ(x | 0)

Now,
x = arg max fX | Θ(x | θ)
θ∈spΘ (1)

EE 527, Detection and Estimation Theory, # 5c 44


and

fX | Θ(x | 0) = N (x | 0, σ 2/N )
2 
1 x 
= p exp − 12 2
2 π σ 2/N σ /N
1
fX | Θ(x | x) = N (x | 0, σ 2/N ) = p
2 π σ 2/N

yielding
N x2
ln ΛGLR(x) = 2
.

Therefore, we accept H1 if

(x)2 > γ

or
|x| > η.
We compare this detector with the (not realizable, also called
clairvoyant) UMP detector that assumes the knowledge of the
sign of θ under H1. Assuming that the sign of θ under H1 is
known, we can construct the UMP detector, whose ROC curve
is given by
PD = Q(Q−1(PFA) − d)
p
where d = N θ2/σ 2 and θ is the value of the parameter
under H1; see (23) for the case where θ > 0 under H1. All
other detectors have PD below this upper bound.

EE 527, Detection and Estimation Theory, # 5c 45


GLR test: Decide H1 if |x| > η. To make sure that the GLR
test is implementable, we must be able to specify a threshold η
so that the false-alarm probability is upper-bounded by a given
size α. This is possible in our example:

PFA(φ(x), 0) = PrX | Θ{|X| > η | 0} see (26)


symmetry p
= 2 PrX | Θ{X > η | 0} = 2 Q(η/ σ 2/N )
PD(φ(x), θ) = PrX | Θ{|X| > η | θ} see (26)
= PrX | Θ{X > η | θ} + PrX | Θ{X < −η | θ}
 η−θ   η+θ 
= Q p +Q p
2
σ /N σ 2/N
 θ 
= Q Q−1(α/2) − p
σ 2/N

−1 θ 
+Q Q (α/2) + p .
2
σ /N

In this case, GLR test is only slightly worse than the clairvoyant
detector (Figure 6.4 in Kay-II):

EE 527, Detection and Estimation Theory, # 5c 46


Example: DC level in WGN with A and σ 2 both unknown.
Recall that σ 2 is called a nuisance parameter since we care
exclusively about θ. Here, the GLR test for

H0 : θ = 0 i.e. θ ∈ spΘ(0) = {0} versus


H1 : θ 6= 0 i.e. θ ∈ spΘ(1) = (−∞, +∞)\{0}

accepts H1 if

maxθ,σ2 fX | Θ,Σ2 (x | θ, σ 2)
ΛGLR(x) = >γ
maxσ2 fX | Θ,Σ2 (x | 0, σ 2)

EE 527, Detection and Estimation Theory, # 5c 47


where

N −1
1 1 X h i
fX | Θ,Σ2 (x | θ, σ 2) = p ·exp − 2 (x[n]−θ)2 .
(2 π σ 2)N 2 σ n=0
(30)
Here,

2 1 −N/2
max fX | Θ,Σ2 (x | θ, σ ) = · e
θ,σ 2 b12(x)]N/2
[2 π σ
1 −N/2
max fX | Θ,Σ2 (x | 0, σ 2) = · e
σ2 b02(x)]N/2
[2 π σ

where

N
1 X 2
b02(x) =
σ x [n]
N n=1
N
1 X
b12(x) =
σ (x[n] − x)2.
N n=1

Hence,
N/2
b02(x)

σ
ΛGLR(x) =
b12(x)
σ
i.e. GLR test fits data with the “best” DC-level signal θbML = x,
finds the residual variance estimate σ b12, and compares this
estimate with the variance estimate σ b02 under the null case (i.e.

EE 527, Detection and Estimation Theory, # 5c 48


b12  σ
for θ = 0). When sufficiently strong signal is present, σ b02
and ΛGLR(x)  1.
Note that
N
1 X
b12(x) =
σ (x − x[n])2
N n=1
N
1 X 2
= (x [n] − 2 x x[n] + x2)
N n=1
N
1 X 
= x2[n] − 2 x2 + x2
N n=1

b02(x) − x2.
= σ

Hence,

b02(x) 
σ  1 
2 ln ΛGLR(x) = N ln 2 2 = N ln 2 2
.
b0 (x) − x
σ 1 − x /bσ0 (x)

Note that
x2
0≤ 2 ≤1
σb0 (x)
and ln[1/(1 − z)] is monotonically increasing on z ∈ (0, 1).
Therefore, an equivalent test can be constructed as follows:

x2
T (x) = 2 > τ.
σ
b0 (x)

EE 527, Detection and Estimation Theory, # 5c 49


The pdf of T (X) given θ = 0 does not depend on σ 2 and,
therefore, GLR test can be implemented, i.e. it is CFAR.

Definition. A test is constant false alarm rate (CFAR) if we


can find a threshold that yields a test whose size is equal to α.

In other words, we should be able to set the threshold


independently of the unknown parameters, i.e. the distribution
of the test statistic under H0 does not depend on the unknown
parameters.

EE 527, Detection and Estimation Theory, # 5c 50

You might also like