You are on page 1of 107

Bayesian Econometrics: Applications in Economics & Finance

Topic 1: Overview of Bayesian Econometric Methods

Daniel Buncic
Sveriges Riksbank

August 20, 2018


Lecture Notes Version: [ibe1a]

Homepage
www.danielbuncic.com
Outline/Table of Contents

Outline
Introduction, Intuition and Background
History of Probability Some Model Based Examples
Different views of Probability Example 1: Binary Model
Bayesian view of Probability Example 2: Poisson Model
Bayesian statistical modelling Example 3: Normal Model
Some General Examples Example 4: Regression Model
Main differences between Bayesian and Frequentist views References
Relation to other Estimation Methods

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  2/107
History of Probability

Origins of Probability Theory


ˆ Notion of probability had its origins in the 17th and very early 18th century (see
Chapter 1 in DeGroot and Schervish (2010) and Chapter 3 in Spanos (1986))
ˆ ‘classical’ definitions of probability were formulated by Jacob Bernoulli in Ars
Conjectandi (1713) and Abraham De Moivre in The Doctrine of Chances 1718)
ˆ all interest in probability theory arose as a consequence of the (notorious)
gambling activities in France at that time
ˆ games of chance, and with it, probability theory became a popular subject of
study as a consequence
ˆ one of the major obstacles in developing a mathematical theory of probability
has been to arrive at a definition of probability that is precise enough for use in
mathematics
ˆ yet it needs to be comprehensive enough to be applicable to a wide range of
phenomena and views on probability

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  3/107
History of Probability

ˆ a widely acceptable definition of probability took another two centuries to be


developed and was marked by much controversy
ˆ the matter was finally resolved in the 20th century by treating probability
theory on an axiomatic basis
ˆ in ”Grundbegriffe der Wahrscheinlichkeitsrechnung” (1933) Andrei
Kolmogorov outlined an axiomatic approach that forms the basis for the
modern theory of probability
ˆ since then, the ideas have been refined somewhat and probability theory is now
part of a more general mathematical discipline, measure theory.
ˆ a historical overview of the sources of Kolmogorov’s aximos is given in Shafer
and Vovk (2006)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  4/107
History of Probability

Definition (Kolmogorov’s Three Axioms of Probability)


Let (Ω, F, P) be a probability space with Ω the sample space, with event space F ,
probability measure P, and let Ai denote the events in in F.
1) P(Ai ) ≥ 0 and P(Ai ) is a real number (ie., P(Ai ) ∈ R)
2) P(Ω) = 1 (needed for P(·) to be a valid probability measure)
3) If {Ai }n
i=1 is a sequence of n mutually exclusive events with no common
elements (ie., Ai ∩ Aj = ∅), then

P (A1 ∪ A2 ∪ · · · ∪ An ) = P(A1 ) + P(A2 ) + · · · + P(An ).

(adapted from the English translation of Kolmogorov (1933) page 2.)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  5/107
History of Probability

Corollary (to Kolmogorov’s Three Axioms of Probability)


1) Let AC C
i be the complement of Ai , so that Ai + Ai = Ω. Then, it follows
that:
P(Ai ) + P(AC
i ) = 1
(1)
P(AC
i ) = 1 − P(Ai ).

2) If P(Aj ) 6= 0, then the conditional probability of Ai given Aj is defined as

P(Ai ∩ Aj )
P(Ai |Aj ) = . (2)
P(Aj )

3) Events Ai and Aj are said to be statistically independent if

P(Ai ∩ Aj ) = P(Ai )P(Aj ). (3)

(adapted from the English translation of Kolmogorov (1933) pages 6-10.)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  6/107
Different views of Probability

What is Probability?
There exists no agreement of a formal definition or interpretation of probability!
ˆ all that we have is a set of rules (axioms) that define the mathematical
properties of probability and its building blocks

There are three common operational interpretations of probability

1) relative frequency approach


– Empirical frequencies in a large number of trials n, has the interpretation of ”on
average” or in ”long-run”, nA # of times event A is observed:
P (A) = lim nA /n. (4)
n→∞

2) combinatorial (Classical) definition


– N mutually exclusive/equaly likely outcomes, NA # of times event A can occur:
P (A) = NA /N. (5)

3) subjective interpretation
– probability as meaning the ”degree of belief” in an event held by an individual,
- what ever way formulated, ie., based on past info, combinatorial probabilities, etc.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  7/107
Different views of Probability

ˆ the ”degree of belief” view is thus a quite general concept and allows for a
”subjective” as well as a ”classical” definition of probability.
ˆ ”classical” definition of probability was formulated by Jacob Bernoulli in Ars
Conjectandi (1713) and Abraham De Moivre in The Doctrine of Chances
(1718)
ˆ the conditions for each trial or experiment have to remain the same for a
”relative frequency” view of probability to be valid
ˆ what does limn→∞ mean operationally? if only a finite number of trials can
ever be conducted, then the ”relative frequency” view of probability cannot
be made operational
ˆ the ”frequentist” interpretation of probability cannot be applied to situations
where there is no precedence of the event that one is interested in, when the
conditions of the experiment change
ˆ the ”combinatorial” interpretation is not feasible when there is an infinite set
of possible equally likely outcomes

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  8/107
Different views of Probability

Example (Coin Tossing)


Suppose that you are interested in deducing the ”fairness” of a particular type of coin
that someone gave to you to see whether it has probability of heads = tails = .5.
The frequentist view of probability would take the long-run relative frequency of
heads as an indication of the probability of heads.
What happens if the coin that is used in the experiment is switched after n tosses to
some other coin? Can we still use the relative frequency view of probability to
analyse whether the coin is fair or not?
There are many arguments that point out that the conditions for an experiment of
this sort are very difficult to keep constant, so that there is a violation of the basic
assumptions for the relative frequency definition of probability in (4).

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  9/107
Different views of Probability

Example (Fund performance)


Suppose that you have to decide where to invest a given amount of money and you
review a group of fundmanagers based on their past performance. Given the past
performance, you can create a ranking based on what manager had the highest (risk
adjusted) return.
ˆ what are you implicitly assuming when you apply this sort of reasoning to
figure out where to invest your money?
ˆ what set of conditions are violated when you think about a frequentist view of
the outcomes
ˆ you often see the caveat ”past performance is not an indication of future
performance” in the brochures or on the website of fundmanagers. What does
that mean in this context?

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  10/107
Bayesian view of Probability

Thinking differently about probability


ˆ Rather than think of probabilities as the relative frequencies of outcomes in a
random experiment, think of probabilities as ”degree of belief” in a
proposition
ˆ this proposition does not need to be thought within a random experiment or
random variables (RVs) in general
– Mr X killed Mr Y, and the court has to asses how probable it is that this
proposition is true, given the evidence.
– Sweden would have won the world cup, if ...
– the signature on a particular (individual) check is not genuine
– Mr Z has a genetic disease given that he has taken a test
– etc. etc.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  11/107
Bayesian view of Probability

ˆ most people are likely to use probability within a relative frequency context of
observed outcomes from the past
ˆ but probability is very often also based on the subjective view, that is, the
”degree of belief” view, where a probability can be assigned to a (personal or
individual) proposition without ever having observed it
ˆ ”degree of belief” can be mapped onto probabilities if they satisfy simple
rules of consistency, which are known as the Cox axioms (Cox, 1946)
ˆ Cox axioms of probability ensure that if two people made the same prior
assumptions and were given the same data, then they will draw identical
conclusions
ˆ such a more general view of probability to quantify beliefs is known as the
Bayesian viewpoint or as the subjective interpretation of probability

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  12/107
Bayesian view of Probability

Probabilistic modelling as an inversion problem


ˆ Purpose of statistical modelling is fundamentally an inversion problem
(Robert (2007) page 8)
ˆ we try to reduce the causes (as captured by the parameters of the probabilistic
data generating process) from the effects (which are contained in the sample
data)
ˆ statistical methods allow us to deduce from the sample data an inference
about the unknown parameter vector θ,
ˆ probabilistic modelling allows us to characterise the behaviour of (future or
out-of-sample) observations conditional on model and parameter vector θ.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  13/107
Bayesian view of Probability

ˆ inversion property of statistics is evident in the notion of the likelihood


function L(θ|x) and the probability density function (PDF) p(x|θ), with x a
vector of i.i.d. sample data from random variable X, where

L(θ|x) = p(x|θ)
N
Y
= p(xi |θ)
i=1

ˆ thus, given θ, we can plot a PDF for different values of x,


ˆ and given x, we can plot a likelihood function for different values of θ.

A general definition of the inversion problem is given by Bayes’ Theorem

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  14/107
Bayesian view of Probability

Theorem (Bayes’ Theorem)


If A and B are two events such that P(B) 6= 0, then
P(B|A)P(A)
P(A|B) = (6)
P(B)

P(B|A)P(A)
P(A|B) = (7)
P(B|A)P(A) + P(B|AC )P(AC )

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  15/107
Bayesian view of Probability

Bayes’ Theorem follows directly from the conditional probability corollary in (2).
By definition we have
P(B|A)P(A) = P(B ∩ A)
and also

P(B) = P(B ∩ A) + P(B ∩ AC )


= P(B|A)P(A) + P(B|AC )P(AC ) (8)

which together with (6) results in (7).


Bayes’ Theorem provides the basis for conducting Bayesian statistical inference.
Bayesian modelling combines in an optimal way all relevant sample information and
prior beliefs by using Bayes’ Theorem.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  16/107
Bayesian statistical modelling

Definition (Likelihood Principle)


Suppose there are two experiments, E1 := (x1 , θ, {p1 (x1 |θ)}) and
E2 := (x2 , θ, {p2 (x2 |θ)}), where θ is the unknown parameter of interest (same in
both experiments), and xi and pi (xi |θ) are vectors of the observed sample data and
corresponding PDF for experiment i = 1, 2, respectively. Suppose that

L(θ|x1 ) = C (x1 , x2 ) L(θ|x2 )


L(θ|x1 ) ∝ L(θ|x2 )

for all θ, where C (x1 , x2 ) is constant with respect to θ, then the information content
in the two likelihoods L(θ|x1 ) and L(θ|x2 ) with respect to θ is the same.

(adapted from Casella and Berger (2001) pages 293-294.)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  17/107
Bayesian statistical modelling

The Likelihood Principle states that all of the relevant information that is available in
a sample of data is contained in its likelihood function.

The Bayesian Statistical Model can then be defined as follows:

Definition (Bayesian Statistical Model)


A Bayesian statistical model is constructed from a parametric statistical model
p(x|θ) (the likelihood function) and a prior distribution on the parameters π(θ).
(adapted from Robert (2007) page 9.)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  18/107
Bayesian statistical modelling

Optimal updating of prior beliefs and Bayes’ Theorem


ˆ combining the definitions of the Likelihood Principle, Bayes’ Theorem and
the Bayesian Statistical Model, we get the following relation
p(x|θ)π(θ)
p(θ|x) =
p(x)
p(x|θ)π(θ) (9)
= R
θ
p(x|θ)π(θ)dθ
p(θ|x) ∝ L(θ|x)π(θ)

where
– ∝ is the ”proportional to” operator
– p(θ|x) is the posterior probability of θ after observing the data
– L(θ|x) ≡ p(x|θ) is the likelihood function which is equal to the PDF of x given θ
– π(θ) is the prior
– p(x) 6= 0 is the marginal density of the data, does not depend on θ and is assumed
to exist

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  19/107
Bayesian statistical modelling

The relation in (9) is the key relation and describes how we proceed under a
Bayesian modelling approach
This can be summarised as follows:
1) given your current knowledge (or information available to you), form a prior
belief on the proposition of interest, express it with π(θ)
2) collect sample data and formulate a (parametric) statistical model, formulate
it as L(θ|x)
3) update your prior beliefs expressed by π(θ) with the new ”information” that
is available and contained in the likelihood L(θ|x) using the rule in (9) as:

p(θ|x) ∝ L(θ|x)π(θ)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  20/107
Bayesian statistical modelling

This way of updating a rational person’s (personal or subjective) beliefs about


parameter vector θ as new information becomes available is known as Bayesian
Updating
ˆ Cox (1946) and Savage (1954) proved that if p(x|θ) and π(θ) represent a
rational person’s beliefs about the data and the prior, then using Bayes’
Theorem to update one’s beliefs is optimal from a decision theoretic
perspective
ˆ thus, from the view point of decision theory, Bayesian Updating is the only
optimal way of how we should update our beliefs about a proposition and
updating should be done using Bayes’ Theorem
ˆ more background on the information/decision theoretic foundations of
Bayesian statistics can be found in Savage (1954), Cox (1946) and in more
recent treatments in Bernardo and Smith (1994) and Robert (2007).

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  21/107
Some General Examples of Bayesian Probability

Example (Simple Scenario S)


Suppose you are interested in scenario S that may arise due to (only) two different
parameter (or model) settings θ1 and θ2 . Also, someone tells you (ie., the data via
the Likelihood) that scenario S is more likely under θ2 than under θ1 so that the
conditional probabilities are P(S|θ1 ) = 0.1 and P(S|θ2 ) = 0.6.
Further, you assign prior equally likely probabilities for θi , P(θ1 ) = P(θ2 ) = 0.5. Now,
P(S) = P(S|θ1 )P(θ1 ) + P(S|θ2 )P(θ2 ) = 0.1 × 0.5 + 0.6 × 0.5 = 0.35.
Using Bayes’ Theorem, we can work out the posterior beliefs as:
P(S|θ1 )P(θ1 ) 0.1 × 0.5 1
P(θ1 |S) = = =
P(S) 0.35 7
P(S|θ2 )P(θ2 ) 0.6 × 0.5 6
P(θ2 |S) = = =
P(S) 0.35 7
thus after updating our beliefs, θ2 is 6 times more likely than θ1 .

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  22/107
Some General Examples of Bayesian Probability

Example (Individual Probability)


Suppose Bob has to take some test concerning some disease. Let D = 1 if Bob has
the disease and 0 otherwise and let T = 1 if Bob returns a positive test result and 0
otherwise.
The test is 95% reliable, meaning that 95% of people who have the disease returned
a positive test result (P(T = 1|D = 1) = 95%) and also 95% of people who do not
have the disease returned a negative test result (P(T = 0|D = 0) = 95%).
Also, 1% of people that have similar characteristics to Bob have the disease (ie. age,
etc) (P(D = 1) = 1%).
Suppose now that Bob has a positive test result (T = 1), what is the probability that
Bob actually has the disease (D = 1)?
Frequentist view: P(D = 1|T = 1) = 95% (95% accuracy)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  23/107
Some General Examples of Bayesian Probability

Example (Individual Probability cont.)


Bayesian view: Use prior P(D = 1) and update based on test results. That is

P(T = 1, D = 1)
P(D = 1|T = 1) =
P(T = 1)
P(T = 1|D = 1)P(D = 1)
=
P(T = 1|D = 1)P(D = 1) + P(T = 1|D = 0)P(D = 0)
0.95 × 0.01
=
0.95 × 0.01 + 0.05 × 0.99
= 0.161

Thus, despite the positive test result, the (posterior) probability that Bob has the
diseases given that he tested positive is only 16.1%.
So this result is very different from the typical frequentist interpretation.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  24/107
Main differences between Bayesian and Frequentist views

Textbook style arguments of how views on probability differ


Frequentist
ˆ all probabilities are in repeated sampling context, not tailored to the individual
or individual proposition
ˆ this becomes most evident when interpreting a (say) 95% confidence interval
(CI)
ˆ 95% CI does not mean that there is a 95% chance that the true parameter is
within this interval
ˆ 95% CI only holds in a repeated sampling context!, ie., as # of trials → ∞
ˆ Frequentist view the population parameter of interest as fixed, while the
estimate is a random variable

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  25/107
Main differences between Bayesian and Frequentist views

Bayesian

ˆ 95% CI constructed from posterior density (or probabilities) can be interpreted


to mean ”there is a 16.1% chance that Bob has the disease”
ˆ Bayesians view the population parameter as a random variable itself
ˆ the MLE point estimate, once the data has been observed, is fixed or
non-random given the sample data

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  26/107
Relation to other Estimation Methods

Other Estimation Methods


ˆ the three main estimation methods that you have encountered so far are
– OLS
– GMM
– MLE
ˆ each has a different set of assumptions with it
– OLS needs E(t Xt ) = 0 for consistency
– GMM requires the instruments use in the moment conditions to be non-weak for
consistency
– MLE requires a correct specification of the model and hence the Likelihood
function
ˆ we know that MLE is the most efficient one as its variance attains the Cramér
Rao lower bound of the inverse of the information matrix
ˆ OLS and GMM can be as efficient under certain conditions

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  27/107
Relation to other Estimation Methods

How does Bayesian estimation relate to any of these other methods?


ˆ Bayesian estimation (in general) requires a fully specified parametric model
– we need to supply p(x|θ) = L(θ|x), that is the density function (and hence the
likelihood function) of the data that we are analysing
– this is the same as MLE thus the Bayesian approach is also a full information
approach
– any errors that can come into MLE will thus also be present in Bayesian estimation
ˆ advantage of Bayesian approach over MLE is that there are two sources of
information
– the likelihood p(x|θ) = L(θ|x) and
– the prior π(θ)
– any identification problems that can occur in parametric models in general can be
dealt with by using priors as extra sources of information
– this is how DSGE models are estimated when weak or unidentified parameters are
present in the model

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  28/107
Relation to other Estimation Methods

ˆ identification in terms of priors is thus another constraint (or structure) that is


imposed on the model that comes from outside
ˆ it’s an assumption that is imposed
– this may not be consistent with the data
– there is no way that one can use the data to test whether this prior is supported or
not, as one would normally do in frequentist (MLE) set up to see if a restriction is
valid
ˆ priors can thus be very powerful, but also potentially another source of error
that comes into the estimation procedure
ˆ priors can solely determine the shape and mode of the posterior if the
likelihood function is flat (ie., the data contain no information about the
model parameters).
ˆ but, as the sample size goes to infinity, Bayesian point estimator goes to MLE.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  29/107
Some Model Based Examples
Example 1: Binary Model

Binary Model
Suppose our variable of interest is Y which is Binary so that

p(y|θ) = θy (1 − θ)1−y , y = 0, 1 and θ ∈ [0, 1] (10)


N PN
and we observe the sample {yi }i=1 with N = 130 and i=1 yi = 100 so that the
number of times yi = 1 is 100 and hence number of times yi = 0 is 30.
Assuming that the data are i.i.d., we can form the joint density function for
y = (y1 , . . . , yN ) as
N
Y
p(y|θ) = θyi (1 − θ)(1−yi ) (11)
i=1
PN PN
yi i=1 (1−yi )
= θ i=1 (1 − θ) (12)
N ȳ N (1−ȳ)
= θ (1 − θ) (13)
−1 PN
where ȳ = N i=1 yi .

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  30/107
Some Model Based Examples
Example 1: Binary Model

Frequentist MLE
The MLE would maximise the likelihood function L(θ|y) = p(y|θ) w.r.t. θ.
-31
x 10
3.5

2.5

2
L(µ jy)

1.5

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ

Figure 1: Plot of L(θ|y) for θ ∈ [0, 1].

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  31/107
Some Model Based Examples
Example 1: Binary Model

Analytically, we would get the MLE of θ by setting the derivative of L(θ|y) = p(y|θ)
w.r.t. θ to zero. As always, with MLE, we actually set the log of the likelihood
ln(L(θ|y)) to zero, that is:

∂ ln(L(θ|y))
=0
∂θ
∂ [N ȳ ln(θ) + N (1 − ȳ) ln(1 − θ)]
=0
∂θ
N ȳ N (1 − ȳ)
=
θ (1 − θ)
N ȳ − N ȳθ = θN − θN ȳ

θ̂MLE = ȳ
= 100/130 = 0.7692.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  32/107
Some Model Based Examples
Example 1: Binary Model

The variance of θ̂MLE can be found as the inverse of information matrix I(θ), where
 2 
∂ [N ȳ ln(θ) + N (1 − ȳ) ln(1 − θ)]
I(θ) = −E
∂θ2
= −E −N ȳθ−2 − N (1 − ȳ) (1 − θ)−2
 

= N E(ȳ) θ−2 + N (1 − E(ȳ)) (1 − θ)−2


| {z } | {z }
=θ =(1−θ)

−1 −1
= Nθ + N (1 − θ)
= N [θ−1 + (1 − θ)−1 ]
= N [θ(1 − θ)]−1

so Var(θ̂MLE ) = I(θ)−1 = θ(1 − θ)/N which, evaluated at θ̂MLE = 100/130, yields


0.001365.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  33/107
Some Model Based Examples
Example 1: Binary Model

Bayesian
Under a Bayesian approach we need to specify a prior π(θ) and then compute the
posterior p(θ|y) from the relation

p(y|θ)π(θ)
p(θ|y) =
p(y)
L(θ|y)π(θ)
= (14)
p(y)

We know that θ ∈ [0, 1]. If we are uncertain about what the most likely value of θ is
before seeing the data, we can assign an uninformative prior, so that θ ∈ [0, 1] has
equal probability of falling anywhere in this interval.
We can let θ ∼ Uniform (a, b), with a = 0 and b = 1.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  34/107
Some Model Based Examples
Example 1: Binary Model

Recall that a continuous Uniform RV X ∈ (a, b) has PDF


(
1
if a ≤ x ≤ b
p(x|a, b) = b−a . (15)
0 otherwise

With a = 0 and b = 1 we get the Uniform prior for θ as π(θ) = 1. From (14) we can
then form the posterior

L(θ|y)π(θ)
p(θ|y) = (16)
p(y)
θN ȳ (1 − θ)N (1−ȳ)
= (17)
p(y)
∝ θN ȳ (1 − θ)N (1−ȳ) (18)
| {z }
kernel

What kind of density is (18)?

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  35/107
Some Model Based Examples
Example 1: Binary Model

A trick that we will frequently employ is to look at the numerator of (17) only,
without considering the marginal density of the data, p(y) at all.
ˆ the right hand side of (18) is often referred to as the kernel of a density
ˆ the kernel of a density determines its shape, hence all that matters
– but kernel does not integrate to 1 as a proper PDF should, but to some other
constant which we can call C, ie.
Z
θN ȳ (1 − θ)N (1−ȳ) dθ = C 6= 1.
θ

– so kernel tells us about the type of density we are dealing with


– and kernel divided by C gives a proper density so that it integrates to 1
ˆ the posterior is a proper density so it must integrate to 1, ie,
Z
p(θ|y)dθ = 1.
θ

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  36/107
Some Model Based Examples
Example 1: Binary Model

because p (θ|y) p(y) = L(θ|y)π(θ).


ˆ thus we know that
Z Z
p(θ|y)p(y)dθ = θN ȳ (1 − θ)N (1−ȳ) dθ
θ θ
Z Z
p(y) p(θ|y)dθ = θN ȳ (1 − θ)N (1−ȳ) dθ
| θ {z } θ
=1

p(y) = C (19)

ˆ once we know what kind of posterior density we are dealing with, we can work
out the normalising constant C easily.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  37/107
Some Model Based Examples
Example 1: Binary Model

Recall that RV Z ∈ [0, 1] ⊂ R has a Beta(α, β) distribution if its PDF is

1
p(z|α, β) = z α−1 (1 − z)β−1 , ∀α, β > 0, (20)
B(α, β) | {z }
kernel

where both, α and β are shape parameters and


Z
B(α, β) = z α−1 (1 − z)β−1 dz, ∀α, β > 0,
z

Γ(α)Γ(β)
B(α, β) =
Γ(α + β)

and Γ(α) = (α − 1)Γ(α − 1) = (α − 1)! (! = factorial operator), which is the


standard Gamma Function from calculus.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  38/107
Some Model Based Examples
Example 1: Binary Model

Let us compare now the kernel of (18) with that of (20).


Note that these are the same!
We have θ = z, α − 1 = N ȳ and β − 1 = N (1 − ȳ), so that the posterior

p(θ|y) = Beta(N ȳ + 1, N (1 − ȳ) + 1) (21)


| {z } | {z }
=α =β

Γ(α)Γ(β)
xα−1 (1 − x)β−1 dx =
R
and the term p(y) = C from (19) is x Γ(α+β)
= B(α, β).
From our earlier numerical values we have:
ˆ α = N ȳ + 1 = 101
ˆ β = N (1 − ȳ) + 1 = 31
ˆ α + β = N ȳ + 1 + N (1 − ȳ) = N + 2 = 132

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  39/107
Some Model Based Examples
Example 1: Binary Model

so that
Γ(101)Γ(31)
C =
Γ(132)
= 100! × 30!/131!
C = 2.9221e − 032 (pretty small number!)

[the Matlab command beta(101,31) computes what we need]


Why do we get this small number?
Recall that
L(θ|y)π(θ)
p(θ|y) =
p(y)
Under our setting we had π(θ) = 1 and with p(y) = C = 2.9221e − 032 we get

L(θ|y)
p(θ|y) =
p(y)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  40/107
Some Model Based Examples
Example 1: Binary Model

L(θ|y)
=
C

ˆ thus, posterior is just a re-scaled version of the likelihood function with the
same shape as the likelihood function!
ˆ this is due to the prior being π(θ) = 1.
ˆ the maximum of L(θ|y) was at 3.1696e − 031 (see Figure (1))
ˆ the maximum of the posterior (ie., the mode of θ|y) thus has to be

L(θ|y)
max(p(θ|y)) =
C
3.1696e − 031
=
2.9221e − 032
≈ 10.8469

which we can see from Figure (2) below.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  41/107
Some Model Based Examples
Example 1: Binary Model

12
Posterior
Prior
10

8
p (µ jy)

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ

Figure 2: Plot of posterior p(θ|y) and the prior π(θ) = 1.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  42/107
Some Model Based Examples
Example 1: Binary Model

What do we do with the posterior?


ˆ note that the posterior contains all the probabilistic information about θ once
we have observed the data y!
ˆ so we can compute any moments of interest, quantiles, tail probabilities, etc.
from the posterior.

We know that
p(θ|y) = Beta(α, β) (22)
where
α = N ȳ + 1 and β = N (1 − ȳ) + 1.
From the moments of a Beta distribution we know that, if θ|y ∼ Beta(α, β), then
ˆ α
E(θ|y) = α+β
,
ˆ αβ
Var(θ|y) = (α+β)2 (α+β+1)
, and
ˆ α−1
mode(θ|y) = α+β−2
, if α, β > 1.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  43/107
Some Model Based Examples
Example 1: Binary Model

To get a Bayesian point estimate (at the centre) of θ|y from (22), we can look at
either of the three well known measures of central tendencies, ie.,
1) the mode
2) the mean
3) or the median of the posterior (no closed form for the Beta PDF)
Which one we end up using depends on our loss function.
If a quadratic loss function is used, we get the standard result that the mean of the
posterior should be used, which is
α
E(θ|y) =
α+β

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  44/107
Some Model Based Examples
Example 1: Binary Model

N ȳ + 1
=
N ȳ + 1 + N (1 − ȳ) + 1
N ȳ + 1
=
N +2
= 101/132
= 0.76515.

(see also Chapter 3 in Koop et al. (2007) for various types of loss functions)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  45/107
Some Model Based Examples
Example 1: Binary Model

If we use the posterior mode, then


α−1
mode(θ|y) =
α+β−2
N ȳ
=
N ȳ + N (1 − ȳ)
N ȳ
=
N
= 100/130

so we get the same result as from θ̂MLE = 0.7692.


The Posterior Variance can be easily found from

αβ − (N y + 1) (N (y − 1) − 1)
Var(θ|y) = =
(α + β)2 (α + β + 1) (N y − N (y − 1) + 2)2 (N y − N (y − 1) + 3)

which is 0.001351.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  46/107
Some Model Based Examples
Example 1: Binary Model

Use of priors
In the previous example we used a Uniform prior for θ
ˆ so effectively imposed no prior information on θ
ˆ we may want to express our prior beliefs more accurately though through a
more informative prior
ˆ all the prior in this example has to satisfy is the requirement that θ ∈ [0, 1] and
θ is continuous, so any density that satisfies these can be used, ie.,
– Beta, Minimax, Noncentral Beta, Standard Power distribution etc. (see Leemis
and McQueston (2008) for more details on these distributions)
ˆ the Beta distribution (and Noncentral Beta also) are particularly interesting
because they are known to be conjugate prior distributions for our Binary
model here.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  47/107
Some Model Based Examples
Example 1: Binary Model

Definition (Conjugate Prior)


A class P of prior distributions for θ is called a conjugate prior for the likelihood
based model p(y|θ) if
π(θ) ∈ P ⇒ p(θ|y) ∈ P,
that is, the prior and the posterior belong to the same class of distributions.
(adapted from Hoff (2009) page 38.)

Definition (Natural Conjugate Prior)


A prior distributions for θ is called a natural conjugate prior if it has the same
properties as a conjugate prior with the additional requirement that the likelihood
function also belong to the same class of distributions (ie., L(θ|y) = p(y|θ) ∈ P).
(see for example Koop (2003) page 18.)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  48/107
Some Model Based Examples
Example 1: Binary Model

Fact: Beta and Uniform densities


A Beta(α, β) density with α = β = 1 is a Uniform(a, b) density (on the interval
[0, 1]) with a = 0 and b = 1.
That is, a Beta distribution with α = β = 1 yields
1
p (θ|α, β) = θ1−1 (1 − θ)1−1 , ∀θ ∈ [0, 1]
B(α = 1, β = 1)
= 1

where
Γ (2)
B(α = 1, β = 1) = = 1.
Γ (1) Γ (1)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  49/107
Some Model Based Examples
Example 1: Binary Model

Using the more general (conjugate) Beta prior for the parameter of interest θ in this
set-up allows us to work with a model that still yields a Beta distribution as the
posterior.
ˆ this is the main advantage of using conjugate priors, posteriors are available
in closed form (ie., analytically)
ˆ the Beta distribution is very flexible and can take on many different shapes (see
Figure (3)) and has the Normal density as a special limiting case as β → ∞.
ˆ it can thus replicate many different prior assumptions, where one can put flat
or uninformative priors as well as highly informative ones over a given region of
importance to the investigator
ˆ for example, for AR(1) models, we can use the Beta prior to restrain the
parameter interval to [0, 1] with a peak at a value of say around 0.9, if we
know (or believe) a series has a fairly large persistence

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  50/107
Some Model Based Examples
Example 1: Binary Model
p(x|θ)

p(x|θ)
x x
(a) Beta density (b) Symmetric Beta density (α = β)

Figure 3: Beta density plots for different values of α and β.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  51/107
Some Model Based Examples
Example 1: Binary Model

The Beta prior in action


Using the same likelihood function as in (13) and combining with the Beta prior θ,
ie., θ ∼ Beta(α0 , β0 ), where α0 , β0 are hyperparameters of the prior π(θ), we get:

p(θ|y) ∝ L(θ|y) × π (θ)


∝ θN ȳ (1 − θ)N (1−ȳ) × θα0 −1 (1 − θ)β0 −1
| {z } | {z }
kernel of L(θ|y) =kernel of π(θ)

∝ θN ȳ+α0 −1 (1 − θ)N (1−ȳ)+β0 −1


∝ θᾱ−1 (1 − θ)β̄−1
| {z }
= Beta(ᾱ,β̄) kernel

θ|y ∼ Beta(ᾱ, β̄), (23)

so the posterior distribution for θ will be a Beta(ᾱ, β̄) distribution, with

ᾱ = N ȳ + α0 and β̄ = N (1 − ȳ) + β0 ,

where ᾱ and β̄ are the posterior parameters of interest.


Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  52/107
Some Model Based Examples
Example 1: Binary Model

We can again compute moments, etc of interest as before.


Note that we have to specify the prior hyperparameters α0 and β0 of π(θ), so this
creates another layer of flexibility as well as ”ambiguity”.
The Beta conjugate prior result also holds if Y is a Binomial RV, ie., the number of
successes in a sequence of n independent trials from a binary outcome, with PDF:
!
n y
p(y|θ, N ) = θ (1 − θ)n−y , y = 0, 1, 2, . . . n. (24)
y

as the kernel of (24) is still θy (1 − θ)n−y .


Same is true if Y is a Negative Binomial or Geometric RV (see Leemis and
McQueston (2008) for details regarding the PDFs)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  53/107
Some Model Based Examples
Example 1: Binary Model

Figure 4: Plots of different priors and posteriors for Binomial RV.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  54/107
Some Model Based Examples
Example 2: Poisson Model

The Poisson Model


Suppose we have count data such that random variable Y takes on the values
0, 1, 2, . . . The density for observation i is a Poisson PDF, taking the form:
θyi exp {−θ}
p(yi |θ) = , ∀yi = 0, 1, 2, . . .
yi !
where {θ > 0} ∈ R. With an i.i.d. sample of size N , we can form the joint density
and likelihood as:
N
Y
p(y|θ) = p(yi |θ)
i=1

N
Y θyi exp {−θ}
=
i=1
yi !
N
Y 1
L(θ|y) = θN ȳ exp {−θN } .
i=1
yi!
| {z }
=K

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  55/107
Some Model Based Examples
Example 2: Poisson Model

Frequentist MLE
The ML estimate is again obtained by setting the derivative of
ln(L(y|θ)) = ln(p(y|θ)) wrt to θ to zero, that is:

∂ ln(p(y|θ))
= 0
∂θ
∂ [N ȳ ln(θ) − N θ + ln(K)]
= 0
∂θ
θN = N ȳ
θ̂MLE = ȳ
QN 1
where ȳ as before and K = i=1 yi ! is some constant that does not depend on θ.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  56/107
Some Model Based Examples
Example 2: Poisson Model

The variance of the ML estimate is I(θ)−1 , where


 2 
∂ [N ȳ ln(θ) − N θ + ln(K)]
I(θ) = −E
∂θ2
= −E −N ȳθ−2
 

= N E [ȳ] θ−2
= N θ−1
PN
since E(ȳ) = N −1 i=1 E(yi ) = θ, because E(y) = θ = Var(y) for a Poisson RV.
So Var(θ̂MLE ) = θ/N , where we would again replace θ by a consistent estimator such
as its MLE θ̂MLE to get an estimate of Var(θ̂MLE ) as θ̂MLE /N .

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  57/107
Some Model Based Examples
Example 2: Poisson Model

Bayesian

Under a Bayesian approach, we again need to specify a prior π(θ) for θ.

ˆ If we choose π(θ) = 1, this would be again an uninformative prior as before


R∞
ˆ but {θ > 0} ∈ R so 0 π(θ)dθ = ∞ thus the uninformative prior π(θ) = 1
does not integrate to unity but rather infinity and is thus known as an
improper prior
ˆ we can still compute the posterior (but cannot do model comparisons, such as
Bayes factors etc) because of the improper prior

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  58/107
Some Model Based Examples
Example 2: Poisson Model

The posterior would just be

p(θ|y) ∝ L(θ|y) × π(θ)


N
Y 1
∝ θN ȳ exp {−θN } ×1
| {z } i=1
y i!
kernel | {z }
some constant
N ȳ
∝θ exp {−θN } . (25)
| {z }
kernel

What does the kernel in (25) look like?

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  59/107
Some Model Based Examples
Example 2: Poisson Model

Recall that RV {Z > 0} ∈ R has a Gamma(α, β) density if its PDF is given by

1
p(z|α, β) = z (α−1) exp{−z/β}, ∀α, β > 0
Γ (α) β α
R∞
where Γ(α) = 0
z (α−1) exp{−z}dz = (α − 1)! as before for the Beta distribution.
ˆ in the Gamma Class of densities, α and β are commonly referred to as shape
and scale parameters

So the kernel in (25) is that of a Gamma(ᾱ, β̄) density with

1
ᾱ = N ȳ + 1 and β̄ = .
N

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  60/107
Some Model Based Examples
Example 2: Poisson Model

A Numerical Example
PN
Suppose we have a sample of N = 100 and compute i=1 yi = 201 as the values
that we observe.
ˆ then, the MLE of θ is just 2.01.
ˆ posterior density p(θ|y) under π(θ) = 1 is proportional to θN ȳ exp {−θN }, ie.,
 
1
θ|y ∼ Gamma N ȳ + 1,
N

From the standard properties of a Gamma(α, β) RV we have:


ˆ E(θ|y) = αβ
ˆ Var(θ|y) = αβ 2
(
(α − 1)β if α > 1
ˆ mode(θ|y) =
0 otherwise

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  61/107
Some Model Based Examples
Example 2: Poisson Model

If we are again interested in the mode as a point estimate, with posterior

θ|y ∼ Gamma(ᾱ, β̄)

we would compute

mode(θ|y) = (ᾱ − 1) β̄
= (N ȳ) /N
= ȳ = 2.01

as our Bayesian point estimate of θ.


If we use the mean as the point estimate, it would be:

E(θ|y) = ᾱβ̄
= (N ȳ + 1)/N = 2.02.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  62/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)

40 3

35
2.5

30

2
25

20 1.5

15
1

10

0.5
5

0 0
0 1 2 3 4 5 6 7 1 1.5 2 2.5 3 3.5 4 4.5 5

(a) Histogram of sample data (b) Posterior density p(θ|y)

Figure 5: Plot of observed Poisson (Count) data and posterior density p(θ|y) for N = 100
and ȳ = 2.01.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  63/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)

Results with a Conjugate prior


We saw from the previous result that using π(θ) = 1 is not only an uninformative
prior but also an improper one.
To stay in the Gamma class of posteriors, we again need to specify a Gamma
conjugate prior for θ, that is:

π(θ) ∝ θα0 −1 exp{−θ/β0 )}


| {z }
kernel of Gamma(α0 ,β0 )

where we are again using the notation α0 , β0 for the hyperparameters of the
Gamma(α0 , β0 ) prior.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  64/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)

The posterior is then

p(θ|y) ∝ L(θ|y) × π (θ)


∝ θN ȳ exp {−θN } × θα0 −1 exp{−θ/β0 )}
| {z } | {z }
kernel of L(θ|y) kernel of π(θ)

N ȳ+α0 −1
∝ θ exp {−θ (N + 1/β0 )} . (26)


The kernel of the posterior p(θ|y) in (26) can be recognised as a Gamma ᾱ, β̄
density with
ᾱ = N ȳ + α0 and β̄ = 1/ (N + 1/β0 ) .

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  65/107
Some Model Based Examples
Example 2: Poisson Model (adapted from Hoff (2009) pages 43-50)

3 3 3
:(3 )
L(3 jy)
2.5 2.5 p(3 jy) 2.5

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0
1 2 3 4 1 2 3 4 1 2 3 4

Figure 6: Plots of prior, posterior and likelihood function for three different prior
hyperparameter values α0 and β0 , and N = 100, ȳ = 2.01.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  66/107
Some Model Based Examples
Example 3: Normal Model

Normal Model (aka Location and scale Model)


Suppose that we have an i.i.d. sample of size N from a Normal distribution, where
the PDF for observation i takes the form
 
−1/2 1
p(yi |µ, σ 2 ) = 2πσ 2 exp − 2 (yi − µ)2 , ∀yi ∈ R

with µ, σ 2 ∈ R × R+ .


The joint density and likelihoods are respectively:


N  
Y −1/21
p(y|µ, σ 2 ) = 2πσ 2 exp − 2 (yi − µ)2
i=1

( N
)
2 2 −N/2 1 X 2
L(µ, σ |y) = 2πσ exp − 2 (yi − µ) .
2σ i=1

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  67/107
Some Model Based Examples
Example 3: Normal Model

Frequentist MLE
The MLE for µ and σ 2 is obtained as before from
N
∂ ln L(µ, σ 2 |y)

2 X
= 2
(yi − µ) = 0
∂µ 2σ i=1
N
X
yi = N µ
i=1

N
X
N −1 yi = µ
i=1

so µ̂MLE = ȳ.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  68/107
Some Model Based Examples
Example 3: Normal Model

For σ 2 we get
N
∂ ln L(µ, σ 2 |y)

N 2 −1 1 2 −2 X
= − σ + σ (yi − µ)2 = 0
∂σ 2 2 2 i=1

N
−1 −2 X
N σ2 = σ2 (yi − µ)2
i=1

N
X
N σ2 = (yi − µ)2
i=1

PN
2
so σ̂MLE = N −1 i=1 (yi − µ̂MLE )2 .

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  69/107
Some Model Based Examples
Example 3: Normal Model

Let θ = µ, σ 2 . The variance/covariance matrix of the MLE of θ we get again from




the inverse of the Fisher Information I(θ), where I(θ) = −E(H) and the Hessian H

∂ 2 ln (L(θ|y))
H=
∂θ∂θ 0
 ∂ 2 ln(L(θ|y)) ∂ 2 ln(L(θ|y))

∂µ∂µ ∂σ 2 ∂µ
=
 

2 2
∂ ln(L(θ|y)) ∂ ln(L(θ|y))
∂µ∂σ 2 ∂σ 2 ∂σ 2
 N

−N/σ 2 (yi − µ)/σ 4
P
 − 
i=1
=
 N N
.

P 4 4 P 2 6
− (yi − µ)/σ N/(2σ ) − (yi − µ) /σ
i=1 i=1

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  70/107
Some Model Based Examples
Example 3: Normal Model

Since Y ∼ Normal(µ, σ 2 ), it follows that E(yi ) = µ and E(yi − µ)2 = σ 2 so that

I(θ) = −E(H)
 N

N/σ 2 E(yi − µ)/σ 4
P
 i=1

=N

N


E(yi − µ)/σ 4 4 2 6
P P
−N/(2σ ) + E(yi − µ) /σ
i=1 i=1

N/σ 2
 
0
=
0 −N/(2σ 4 ) + N/σ 4
N/σ 2
 
0
=
0 N/(2σ 4 )

so that
σ 2 /N
 
0
Var(θ̂MLE ) = I(θ)−1 = .
0 2σ 4 /N

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  71/107
Some Model Based Examples
Example 3: Normal Model

Bayesian
Under a Bayesian setting, we again need to specify a prior, but now for the joint
parameter vector θ, ie., π (θ) = π µ, σ 2 . A few options exist here:
ˆ one could use a conjugate prior for θ (we will see that later)
ˆ another common one is the following combination for µ and σ 2 (related to
Jeffreys’ prior which will be discussed later as well)
– assume µ and σ 2 are independent, hence
π(θ) = π(µ, σ 2 )

= π(µ|σ 2 )π(σ 2 )

= π(µ)π(σ 2 ).
– then set a flat prior for µ, ie., π(µ) ∝ 1 (uninformative and improper prior for µ)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  72/107
Some Model Based Examples
Example 3: Normal Model

– set also a flat prior for the log of σ 2 , that is, let φ = ln σ 2 , then π (φ) ∝ 1.


∂φ 1
– noting that π(σ 2 ) = π(φ) ∂σ 2 = π (φ) σ 2 (from RV transform), we get
| {z }
∝1

π(θ) = π µ, σ 2


= π µ|σ 2 π σ 2
 

1
= π (µ) π (φ)
σ2
1
∝ . (27)
σ2

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  73/107
Some Model Based Examples
Example 3: Normal Model

Given π(θ) in (27) and data density p (y|θ), the posterior becomes

p (θ|y) ∝ p (y|θ) × π (θ)


(
N
)
2 −N/2 1 X 2 −1
∝ 2πσ exp − 2 (yi − µ) × σ2
2σ i=1
( N
)
2−(N/2+1) 1 X 2
∝σ exp − 2 (yi − µ) (28)
2σ i=1

where we can drop the terms involving (2π)−N/2 as they do not depend on
θ = (µ, σ 2 ).
How do we use the posterior for the joint parameter vector θ in (28)?

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  74/107
Some Model Based Examples
Example 3: Normal Model

Since we want to find the (marginal) posterior for µ and σ 2 , given the data, we need
to integrate out the other parameters that are not of interest to us, that is,
Z
p(µ|y) = p(θ|y)dσ 2
σ2

and Z
p(σ 2 |y) = p(θ|y)dµ
µ

in this setting.
Before we do that, note that
N
X N
X N
X
(yi − µ)2 = [(yi − ȳ) − (µ − ȳ)]2 , where ȳ = N −1 yi
i=1 i=1 i=1

N
X
(yi − ȳ)2 − 2 (yi − ȳ) (µ − ȳ) + (µ − ȳ)2
 
=
i=1

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  75/107
Some Model Based Examples
Example 3: Normal Model

N
X N
X N
X
= (yi − ȳ)2 −2 (µ − ȳ) (yi − ȳ) + (µ − ȳ)2
i=1 i=1 i=1
| {z } | {z }
=SSE =0

= SSE + N (µ − ȳ)2 (29)

where SSE is Sum of Squared Errors


N
X
SSE = (yi − ȳ)2
i=1

= (N − 1) s2
| {z }

2
= νs ,

and

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  76/107
Some Model Based Examples
Example 3: Normal Model

ˆ s2 is the standard (unbiased) estimate of the population variance, ie., the


sample variance and
ˆ ν is the Degrees of Freedom parameter.

Note that SSE is a function of the data only and hence does not depend on
parameter vector of interest θ!
Combining (28) and (29) we can re-express (28) as
(  2
)
2−(N/2+1) 1 SSE + N (µ − ȳ)
p (θ|y) ∝ σ exp − 2 (30)
σ 2

Rather then using the relation in (28) to integrate out the unwanted parameter, this
is commonly done on the relation in (30)!

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  77/107
Some Model Based Examples
Example 3: Normal Model

Computing the marginal posterior for σ 2


To get p σ 2 |y we integrate µ out of the joint posterior in (30) as follows


Z
p(σ 2 |y) = p µ, σ 2 |y dµ

µ
(  )
N (µ − ȳ)2
Z  
2−(N/2+1) SSE
∝ σ exp − exp − dµ
µ 2σ 2 2σ 2
| {z }
does not depend on µ
(  )
N (µ − ȳ)2
 Z
2−(N/2+1) SSE
∝σ exp − exp − dµ
2σ 2 µ 2σ 2
| {z }
kernel of Normal(ȳ,σ 2 /N )

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  78/107
Some Model Based Examples
Example 3: Normal Model

Note that (  )
N (µ − ȳ)2
Z
exp − dµ = (2πσ 2 /N )1/2
µ 2σ 2

so that
1/2
σ2
 
SSE
p(σ 2 |y) ∝ σ 2−(N/2+1) exp − 2π
2σ 2 N
 
N −1 SSE
∝ σ 2−( 2 +1) exp − . (31)
2σ 2

The marginal posterior in (31) can be recognised as an Inverse Gamma density with
α = (N − 1) /2 and β = SSE/2, that is,
 
2 N − 1 SSE
σ |y ∼ InvGam , . (32)
2 2

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  79/107
Some Model Based Examples
Example 3: Normal Model

A Quick detour: Some Facts about Gamma RVs

Fact: Re-scaling a Gamma RV

If X ∼ Gamma (α, β) then Y = κX ∼ Gamma (α, κβ).


Proof:

∂x
p(y) = p(x)
∂y
 y (α−1)
n y o
= [Γ(α)]−1 [β −α ] exp − β −1 κ−1
κ κ
−1 −α (α−1)
exp −y (κβ)−1 .

= [Γ(α)] [κβ] y (33)
| {z }
Gamma(α,κβ)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  80/107
Some Model Based Examples
Example 3: Normal Model

Fact: Re-scaling an Inverse Gamma RV

If X ∼ InvGam (α, β) then Y = κX ∼ InvGam (α, κβ).


Proof:

∂x
p(y) = p(x)
∂y
 y −(α+1)   y −1 
−1
= [Γ(α)] α
[β ] exp −β κ−1
κ κ
= [Γ(α)]−1 [κβ]α y −(α+1) exp − (κβ) y −1 .

(34)
| {z }
InvGam(α,κβ)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  81/107
Some Model Based Examples
Example 3: Normal Model

Fact: Inverse of an Inverse Gamma RV

If X ∼ InvGam (α, β = 1) then Y = 1/X ∼ Gamma (α, β = 1).


Proof:

∂x
p(y) = p(x)
∂y
 −(α+1) (   )
−1
−1 1 1
= [Γ(α)] exp − y −2
y y

= [Γ(α)]−1 y (α−1) exp {−y} . (35)


| {z }
Gamma(α,β=1)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  82/107
Some Model Based Examples
Example 3: Normal Model

Note: The reason why we are using β = 1 in the Inverse of an Inverse Gamma RV
transformation is to avoid having to re-define the scale parameter as β = 1/b as was
done in the distributions notes.

Fact: Gamma and Chi-square


A Gamma(α = ν/2, β = 2) RV is a Chi-square RV with ν degrees of freedom (χ2ν ).
In Matlab, 2*randg(ν/2, 1) ∼ Chi2(ν).

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  83/107
Some Model Based Examples
Example 3: Normal Model

Summary and implementation in Matlab


If X ∼ Gamma (α, 1), then we have the following relations:

1) βX ∼ Gamma (α, β) [shape α, scale β], Matlab code: β∗randg(α, T, 1)


2) 1/X ∼ InvGam(α, 1) [shape α, scale 1], Matlab code: 1./randg(α, T, 1)
3) β/X ∼ InvGam(α, β) [shape α, scale β], Matlab code: β./randg(α, T, 1)
4) 2X ∼ Chi2(α) [shape α, scale 2], Matlab code: 2∗randg(α,T, 1)

The last relation in 4) will generate a Chi2 RV with α degrees of freedom if we draw
from 2*randg(α,T,1), with α = ν/2 ⇔ ν = α/2.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  84/107
Some Model Based Examples
Example 3: Normal Model

From (32) we had the marginal posterior for σ 2

ν νs2
z }| { z }| { !
2 N − 1 SSE
σ |y ∼ InvGam , .
2 2

Using the results from above, we have the relations

2σ 2
 
N −1
y ∼ InvGam , 1
SSE 2
 
SSE N −1
y ∼ Gamma , 1
2σ 2 2
 
SSE N −1
y ∼ Gamma , 2
σ2 2

SSE
y ∼ Chi2(N − 1)
σ2

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  85/107
Some Model Based Examples
Example 3: Normal Model

Setting SSE = (N − 1) s2 we get

(N − 1) s2

y ∼ Chi2(N − 1)
σ2
σ2
s2 y

∼ Chi2(N − 1).
(N − 1)

Also, since E(z) = N − 1, when Z ∼ Chi2(N − 1), we get the standard result that

σ2
E( s2 y) =

E[Chi2(N − 1)]
(N − 1) | {z }
=N −1

2
=σ ,

so the sample variance s2 is an unbiased estimator of the population variance.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  86/107
Some Model Based Examples
Example 3: Normal Model

To obtain a point estimate, we can again use the mean (or mode or median). Since
the marginal posterior for σ 2 , given the data y is:
 
N −1 SSE
σ 2 |y ∼ InvGam α = ,β =
2 2
β
and E(z) = α−1
if RV Z ∼ InvGam (α, β), we get

SSE
E σ 2 |y = 2

N −1
2
− 1
SSE
=
(N − 3)
β
as the posterior mean and with mode(z) = α+1
.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  87/107
Some Model Based Examples
Example 3: Normal Model

Recal, if RV Z ∼ InvGam (α, β)


SSE
mode σ 2 |y = 2

N −1
2
+ 1
SSE
=
(N + 1)

Notice here that the mode(σ 2 |y) 6= max (L (θ|y)) because π(θ) ∝ 1/σ 2 and not
just a constant, so exact results differ because of the prior on θ.
ˆ But as N → ∞, this difference disappears and mode σ 2 |y → σ̂MLE2

.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  88/107
Some Model Based Examples
Example 3: Normal Model

Marginal posterior for µ


Now to get p (µ|y) we need to integrate σ 2 out of the posterior in (30), that is,
Z
p µ, σ 2 |y dσ 2

p(µ|y) =
σ2
 ( 2
)
1 SSE + N (µ − ȳ)
Z
∝ σ 2−(N/2+1) exp −
2
dσ 2
σ 2 σ 2
Z  
β
p(µ|y) ∝ σ 2−(α+1) exp − 2 dσ 2 (36)
σ2 σ
| {z }
kernel of InvGam(α,β)

where α = N/2 and β = [SSE + N (µ − ȳ)2 ]/2.


The InvGam kernel in (36) has to integrate to Γ(α)β −α for it to be a proper density.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  89/107
Some Model Based Examples
Example 3: Normal Model

Thus (remember, the RV is µ)

p(µ|y) ∝ Γ(α)β −α

∝ Γ(N/2)([SSE + N (µ − ȳ)2 ]/2)−N/2

∝ [SSE + N (µ − ȳ)2 ]−N/2 (37)

where Γ (N/2) and 2 drop out as they do not depend on µ.


With SSE = s2 (N − 1) = N 2
P
i=1 (yi − ȳ) , also not depending on µ, we then get

p(µ|y) ∝ [SSE + N (µ − ȳ)2 ]−N/2


−N/2
N (µ − ȳ) 2

= SSE + SSE (38)
SSE
−N/2
N (µ − ȳ) 2
 
= SSE 1 + (39)
SSE

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  90/107
Some Model Based Examples
Example 3: Normal Model

−N/2
N (µ − ȳ) 2

= SSE −N/2 1 + (40)
SSE
−N/2
N (µ − ȳ) 2
 
∝ 1+
SSE
−(N −1+1)/2
N (µ − ȳ) 2

1
= 1+
(N − 1) s2
"  2 #−(ν+1)/2
1 (µ − ȳ)
= 1+ √ (41)
ν s/ N

where ν = (N − 1).
The kernel in (41) is that of a (non-standard) Students’ t distribution with ν degrees
of freedom, and location ȳ and scale s2 /N .

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  91/107
Some Model Based Examples
Example 3: Normal Model


Defining ω = N (µ − ȳ) /s we can compute the (standard) Students’ t distribution
for RV ω with PDF

∂µ
p(ω|y) = p(µ|y)
∂ω
 −(ν+1)/2
1 s
∝ 1 + ω2 √
ν N
 −(ν+1)/2
1
∝ 1 + ω2
ν

where E(ω|y) = 0, ∀ν > 1.


To get a Bayesian point estimate, we again look at some standard measures of
central tendency.
Given that µ|y ∼ Students(ν, ȳ, s2 /N ) we have

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  92/107
Some Model Based Examples
Example 3: Normal Model

ˆ E(µ|y) =ȳ
ˆ mode(µ|y) =ȳ
ˆ median(µ|y) =ȳ

We further have the standard result that as ν → ∞,


d
t(ν, ȳ, s2 /N )) −→ Normal ȳ, s2 /N )


so that
µ|y ∼ Normal ȳ, s2 /N ) .


Note that since the prior on µ was ∝ 1, posterior (Bayesian) estimate is same as
MLE (µ̂MLE = ȳ).

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  93/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

The Plain Vanilla Regression Model


Suppose we have an i.i.d. sample of size N from the regression model
p
X
yi = xij βj + εi , ∀i = 1, . . . , N
j=1

where εi ∼ Normal 0, σ 2 .


In matrix form we have


y = Xβ + ε, (42)
2

where ε ∼ MNormal 0(N ×1) , σ I(N ×N ) and

···
       
y1 x11 x12 x1p β1 ε1
 y2   x21 x22 ··· x2p  β2   ε2 
y =  . , X = . ..  , β =  . , ε = . 
       
.. ..
(N ×1)  ..  (N ×p)  .. . . .  (p×1)  ..  (N ×1)  .. 
yN xN 1 xN 2 ··· xN p βp εN

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  94/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

with β, σ 2 ∈ Rp × R+ .


Note that the first column of X is frequently a column of ones corresponding to the
intercept term in the regression model, but this is not important here.
In general, a random vector Z(k×1) is said to be Multivariate Normal distributed with
1) mean vector µ(k×1) and
2) co-variance matrix Σ(k×k)
if its PDF is
 
1
p(Z|µ, Σ) = (2π)−k/2 det (Σ)−1/2 exp − (Z − µ)0 Σ−1 (Z − µ)
2
Z ∼ MNormal(µ, Σ)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  95/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

Frequentist MLE
The Likelihood function (joint density) is simply (where Σ = σ 2 I)
 
−1/2 1 −1
L(β, σ 2 |y, X) = (2π)−N/2 det σ 2 I exp − (y − Xβ)0 σ 2 I

(y − Xβ)
2
 
2 −N/2 1 0
= (2πσ ) exp − 2 (y − Xβ) (y − Xβ)

with the MLE for β and σ 2 being obtained as before as

∂ ln(L(β, σ 2 |y, X))


=0
∂β
1 0
X (y − Xβ) = 0
σ2
−1
so β̂MLE = (X0 X) (X0 y)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  96/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

and
∂ ln(L(β, σ 2 |y, X))
=0
∂σ 2
N 2 −1 1 2 −2
(σ ) + (σ ) (y − Xβ)0 (y − Xβ) = 0

2 2
 0  
2
so that σ̂MLE = N −1 y − Xβ̂MLE y − Xβ̂MLE .
2
The variance/covariance matrix for θ̂MLE = (β̂MLE σ̂MLE ), is obtained as before from
the inverse of the Information matrix, where I(θ) = −E ( ·| X), ie., conditional on X:
 2
∂ ln(L(β, σ 2 |y, X))

I(θ) = −E X
∂θ∂θ

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  97/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

−σ −2 (X0 X) −σ −4 ε0 X
  

= −E −6 X
−σ −4 ε0 X −4 0
1

2
N σ − ε εσ
 −2 0 
σ (X X) 0
=
0 N/2σ −4

yielding
−1
σ 2 (X0 X)
 
0
Var(θ̂MLE ) = I(θ)−1 = .
0 2σ 4 /N

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  98/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

Bayesian
Under a Bayesian setting, using again the same (Jeffrey’s) prior for
π (θ) = π(β, σ 2 ) ∝ 1/σ 2 as before in Example 3: The Normal Model, we get:
 
1
p(θ|y, X) ∝ (σ 2 )−(N/2+1) exp − 2 (y − Xβ)0 (y − Xβ) (43)

and we can again seek an expression for (y − Xβ)0 (y − Xβ) that relates y to the
OLS (and here also MLE) estimate β̂ = (X0 X)−1 (X0 y).
Re-writing the term

=(A−B)0 =(A−B)
z }| {z }| {
(y − Xβ)0 (y − Xβ) = [(y − Xβ̂) − X(β − β̂)]0 [(y − Xβ̂) − X(β − β̂)]
| {z } | {z } | {z } | {z }
=A =B =A =B

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  99/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

= A0 A − A0 B − B0 A + B0 B

where A0 A = (y − Xβ̂)0 (y − Xβ̂) = SSE, B0 B = (β − β̂)0 X0 X(β − β̂) and

A0 B = (y − Xβ̂)0 X(β − β̂)


−1 0
= [y − X X0 X (X y)]0 X(β − β̂)
| {z }
=H (hat matrix)

= [y − Hy]0 X(β − β̂)

= y0 (I − H)0 X(β − β̂). (44)

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  100/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

−1 0
Because the hat (or projection) matrix H = X (X0 X) X is a symmetric (and
idempotent) matrix, we have H = H0 (and H0 H = I(N ×N ) ) so that

(I − H)0 X = X − HX
−1 0
= X − X X0 X XX=0

Thus, A0 B = 0. Similarly, B0 A = 0. This yields the result

(y − Xβ)0 (y − Xβ) = SSE 0 0


| {z } +(β − β̂) (X X)(β − β̂).
scalar

Using this result, we can write (43) as


 
1
 
p(β, σ 2 |y, X) ∝ (σ 2 )−(N/2+1) exp − 2 [SSE +(β − β̂)0
(X0
X)(β − β̂)] (45)
 2σ | {z0 } 
=A A

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  101/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

and proceed to find p(σ 2 |y, X) as with the Normal model, ie., form
Z
p(σ 2 |y, X) = p(β, σ 2 |y, X)dβ
β
 
SSE
= (σ 2 )−(N/2+1) exp −
2σ 2
Z  
1
× exp − 2 [(β − β̂)0 (X0 X)(β − β̂)] dβ (46)
β 2σ

where the kernel of (46) is a Multivariate Normal with mean β̂ and covariance matrix
−1
σ 2 (X0 X) , so will integrate to
 −1 1/2 −1/2
(2π)p/2 det σ 2 X0 X = (2πσ 2 )p/2 det X0 X .

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  102/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

−1/2
We then get, after dropping the (2π)p/2 and the det (X0 X) constants:
 
SSE
p(σ 2 |y, X) ∝ (σ 2 )−(N/2+1) exp − × (σ 2 )p/2
2σ 2

 
−p
2 −( N 2 +1) SSE
∝ (σ ) exp − (47)
2σ 2

where the expression in (47) is the kernel of an InvGam (α, β) density with
α = (N − p)/2 and β = SSE/2.
So the result for the marginal posterior p(σ 2 |y, X) from the regression model is
analogous to the Normal model that we obtained earlier.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  103/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

The marginal posterior for β is obtained by integrating σ 2 out of (46), ie., compute
Z  
1
p(β|y, X) ∝ (σ 2 )−(N/2+1) exp − 2 [SSE + (β − β̂)0 (X0 X)(β − β̂)] dσ 2
σ2 2σ
(48)
As before, we can recognise the kernel in (48) to be that of an InvGam (α, β), where

α = N/2

and
β = [SSE + (β − β̂)0 (X0 X)(β − β̂)]/2,
so that

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  104/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

p(β|y, X) ∝ Γ (α) β −α
 −N/2
= Γ(N/2) [SSE + (β − β̂)0 (X0 X)(β − β̂)]/2

∝ [SSE + (β − β̂)0 (X0 X)(β − β̂)]−N/2

(β − β̂)0 (X0 X)(β − β̂) −N/2


= [SSE + SSE ] (49)
SSE
" #−(ν+p)/2
1 (β − β̂)0 (X0 X)(β − β̂)
∝ 1+
ν s2
[Σ−1 = X0 X/s2 ]

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  105/107
Some Model Based Examples
Example 4: Plain Vanilla Regression Model

 −(ν+p)/2
1
∝ 1 + (β − β̂)0 Σ−1 (β − β̂) (50)
ν

where ν = (N − p), SSE = νs2 , Σ = s2 (X0 X)−1 and p = dim (β), ie., the number
of regressors.
The expression in (50) can
 be recognised as a Multivariate Students’ t distribution
denoted by Mt ν, β̂, Σ with ν degrees of freedom and location and scale being β̂
and Σ, respectively, ie.,
 −(ν+p)/2
Γ ((ν + p) /2) −1/2 1 0 −1
p(β|y, X) = det (Σ) 1 + (β − β̂) Σ (β − β̂)
Γ (ν/2) (νπ)p/2 ν

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  106/107
References

Bernardo, José M. and Adrian F. M. Smith (1994): Bayesian Theory, Wiley Series in Probability and Statistics, John
Wiley & Sons.
Casella, George and Rober L. Berger (2001): Statistical Inference, 2nd Edition, Duxbury Press.
Cox, Richard T. (1946): “Probability, frequency and reasonable expectation,” American journal of physics, 14(1), 1–13.
DeGroot, Morris H. and Mark J. Schervish (2010): Probability and Statistics, 4th edition Edition, Pearson.
Hoff, Peter D. (2009): A First Course in Bayesian Statistical Methods, Springer Verlag.
Kolmogorov, Andrei N. (1933): Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer.
Koop, Gary M. (2003): Bayesian Econometrics, John Wiley & Sons.
Koop, Gary M., Dale J. Poirier and Justin L. Tobias (2007): Bayesian Econometric Methods, Cambridge University
Press.
Leemis, Lawrence M. and Jacquelyn T. McQueston (2008): “Univariate distribution relationships,” The American
Statistician, 62(1), 45–53.
Robert, Christian P. (2007): The Bayesian Choice: From Decision-Theoretic Foundations to Computational
Implementation, Springer Verlag.
Savage, Leonard J. (1954): The Foundations of Statistical Inference, John Wiley and Sons.
Shafer, Glenn and Vladimir Vovk (2006): “The Sources of Kolmogorov’s Grundbegriffe,” Statistical Science, 21(1),
70–98.
Spanos, Aris (1986): Statistical Foundations of Econometric Modelling, Cambridge University Press.

Daniel Buncic (Sveriges Riksbank) Topic 1: Bayesian Econometrics August 20, 2018 :  107/107

You might also like