3 - 2 - Module 11 - GLM Estimation (8-08)

Okay, in the previous module, we
introduced the general linear

module approach, where the GLM approach
starts analyzing fMRI data.
In this module we'll talk about how we can
estimate the parameters of the GLM.
So you'll recall that a standard GLM model
can be written in the following format.
We have Y is equal to X beta plus epsilon.
Where epsilon is normally distributed with
mean 0 and variance covariance matrix V.
Here Y represents the fMRI data from a
single voxel, strung out over time.
X is the design matrix.
Beta is, are the unknown regression
coefficients
which we want to estimate, and epsilon's
normal's.
So for simplicity, lets assume that we're
looking at a model where the noise
is, fall apart, is i.i.d.,
so, independent identically distributed.
So here we have these equal to the
identity times sigma squared.
So the only unknown parameter in
the variance, covariance matrix is sigma
square.
In this particular model, the matrices X
and Y are assumed
to be known, and the noise is considered
to be uncorrelated.
So Y is known because that, this is
simply the data that we've extracted from
the voxel.
X is known, because this is the design
matrix.
Here's where we put all our information
about
the tasks that we performed and, and other
things.
We'll talk a lot about how to design the
design matrix, sort of speak.
Now the, the unknown parameters
in this model are the betas.
So those are the weights of the columns of
the design matrix and also sigma square.
So our goal here is to find a good
estimate of beta and sigma squared.
In principle, what we'll try to do, in
least squares regression, at least.
Is to find a value of beta that minimizes
the following quantity.
So basically, what this quantity is is the
sum of the squared residuals in matrix
form.
So we want to make the squared residuals
be
as close to the regression line as
possible.
The ordinary least squares solution to
this problem,
that's when we have the epsilon is i.i.d.
Beta hat is equal to X transpose X
inverse, X transpose y.
Here we see a little cartoon of what's
going on here, where y is the data,
is this vector in, you know, n dimensional
space, where n is the number of time
points.
And what we're doing is, we're finding the
solution in the space of
the des, columns of the design matrix that
minimizes the distance to the data.
Now this particular ordinarily squared
solution
has a couple of interesting properties.
One is, it's also equivalent to the
maximum likelihood
estimate, we'll, we'll revisit that
concept a little bit later.
In addition, it's unbiased.
That means in expectation, beta hat is
going to be equal to beta.
So that means if we were to kind of repeat
this experiment
lots of times, on average we would get the
correct results.
Now the variance of
beta hat is going to be equal to sigma
squared times X transpose X inverse.
So it's going to depend on both sigma
squared and on the design matrix itself.
So the question that one can actually ask
here is, can we do better than this?
Can we find another estimate, for example,
another unbiased estimate that has a
smaller variance?
If that were the case, then we would
choose that instead of the OLS solution.
The answer to that is no.
And according to something called the
Gauss-Markoff theorem, this states that
any other
unbiased estimator of beta will have a
larger variance than the OLS solution.
So, what we mean here is assume that we
have
another estimate, beta tilde, that's also
an unbiased estimator of beta.
So beta tilde on average is going to be
equal to beta.
Then according to the Gauss-Markov
theorem,
the variance beta has always going to be
as small or smaller than the variance of
any given unbiased estimator beta tilde.
So in this case, we say that beta hat is
the best linear unbiased
estimator, the BLUE, of beta.
That means that among the class of all
possible linear unbiased
estimators, beta, then beta hats the one
has the smallest variance.
And we like to have small variance,
because
that means we're, we're close to the true
value.
So finding sum if epsilon is i.i.d.,
then the ordinary least square OLS
estimate is optimal.
So we have Y is equal to X beta plus
epsilon as our model, and
then we're going to have beta hat is X
trans plus X inverse X trans plus Y.
Now what happens if the variance of
epsilon is non i.i.d.,
so the variance of epsilon is equal to say
V times
sigma squared, where V is not equal to the
identity matrix.
Well, in that case, Ordinary Least Square
is no longer optimal.
In this case, something called
Generalized Least Squares or GLS is
optimal.
So, the GLS solution looks very similar to
the OLS solution, but it's beta hat,
X transpose V inverse X, inverse X
transpose V inverse Y.
So what we see is, you had, you had to
include the V inverse there two places.
Now, what will happen is that if V is
equal to the identity matrix, then the,
the V inverses will, will disappear from
this equation and we'll get the OLS
solution.
So this is a generalization of the OLS
solution.
Now, for these reasons, we'll be working
with this solution
rather than the OLS solution, because it
encapsulates the OLS solution.
So in summary, we have that Y is equal to
X beta plus epsilon, and we have the best
linear unbiased estimator for this
regardless of what the value of V is.
Now, this is the best linear unbiased
estimator,
under the condition that we know what V
is.
Now, often times, of course, we don't know
that or always we don't know that.
And, and so we have to estimate V as well.
So how would we go about
doing that?
Well, the first thing we have to do is,
you know, we calculate the fitted value so
Y hat is equal to X has beta hat, so
that's our estimated signal.
And then we, we, we look at the difference
between the, the
true signal and the estimated signal and
that gives us the residuals.
Now we can rewrite the residuals in the
following way by changing Y hat to
X beta hat and putting in the value of
beta hat we get the following equation.
Now this can just be written as R times Y.
Where R is the so called residual inducing
matrix.
So we'll, we'll need these terms a little
bit later on.
We're little r is going to refer to the
residuals.
So that's the data minus the fitted
values.
And capital R is the residual inducing
matrix.
That's the matrix
that we applied to the data to get the
residuals.
Even if we assume that epsilon's i.i.d.,
we still need to estimate the residual
variance sigma squared.
So, our estimate is going to be based on
the residual.
It's going to be r transpose r.
So, that's going to be the sum of the
squared residuals in matrix speak here.
And divided by the traits of R, capital R,
the residual and this thing
matrix NV.
So for our, OLS, this later term is just N
minus p, so its
the number of time points minus the number
of columns in the design matrix.
So this might be familiar if you have
taken a regression class.
However, estimating the parameters of V
when V is
not equal to the identity matrix is much
more difficult.
And we'll return to this problem a little
bit later on when,
after we've talked about different models
that we could use for V.
Okay, so that's the end of the module.
And here we talked about how we could
estimate the parameters in the GLM model.
In the coming modules we'll talk about how
we can refine the GLM
model to fit our data better.
So we'll talk about how we can choose what
columns to, that's going to go
into the design matrix and what our
variance, covariance programs matrix
should look like.
Okay, I'll see you then.
Bye.

3 - 2 - Module 11 - GLM Estimation (8-08)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 - 2 - Module 11 - GLM Estimation (8-08)

Uploaded by

Copyright:

Available Formats

Okay, in the previous module, we

introduced the general linear

You might also like