The general linear module approach starts analyzing fMRI data. In this module we'll talk about how we can estimate the parameters of the GLM. In principle, we want to find a value of beta that minimizes the following quantity.
The general linear module approach starts analyzing fMRI data. In this module we'll talk about how we can estimate the parameters of the GLM. In principle, we want to find a value of beta that minimizes the following quantity.
The general linear module approach starts analyzing fMRI data. In this module we'll talk about how we can estimate the parameters of the GLM. In principle, we want to find a value of beta that minimizes the following quantity.
module approach, where the GLM approach starts analyzing fMRI data. In this module we'll talk about how we can estimate the parameters of the GLM. So you'll recall that a standard GLM model can be written in the following format. We have Y is equal to X beta plus epsilon. Where epsilon is normally distributed with mean 0 and variance covariance matrix V. Here Y represents the fMRI data from a single voxel, strung out over time. X is the design matrix. Beta is, are the unknown regression coefficients which we want to estimate, and epsilon's normal's. So for simplicity, lets assume that we're looking at a model where the noise is, fall apart, is i.i.d., so, independent identically distributed. So here we have these equal to the identity times sigma squared. So the only unknown parameter in the variance, covariance matrix is sigma square. In this particular model, the matrices X and Y are assumed to be known, and the noise is considered to be uncorrelated. So Y is known because that, this is simply the data that we've extracted from the voxel. X is known, because this is the design matrix. Here's where we put all our information about the tasks that we performed and, and other things. We'll talk a lot about how to design the design matrix, sort of speak. Now the, the unknown parameters in this model are the betas. So those are the weights of the columns of the design matrix and also sigma square. So our goal here is to find a good estimate of beta and sigma squared. In principle, what we'll try to do, in least squares regression, at least. Is to find a value of beta that minimizes the following quantity. So basically, what this quantity is is the sum of the squared residuals in matrix form. So we want to make the squared residuals be as close to the regression line as possible. The ordinary least squares solution to this problem, that's when we have the epsilon is i.i.d. Beta hat is equal to X transpose X inverse, X transpose y. Here we see a little cartoon of what's going on here, where y is the data, is this vector in, you know, n dimensional space, where n is the number of time points. And what we're doing is, we're finding the solution in the space of the des, columns of the design matrix that minimizes the distance to the data. Now this particular ordinarily squared solution has a couple of interesting properties. One is, it's also equivalent to the maximum likelihood estimate, we'll, we'll revisit that concept a little bit later. In addition, it's unbiased. That means in expectation, beta hat is going to be equal to beta. So that means if we were to kind of repeat this experiment lots of times, on average we would get the correct results. Now the variance of beta hat is going to be equal to sigma squared times X transpose X inverse. So it's going to depend on both sigma squared and on the design matrix itself. So the question that one can actually ask here is, can we do better than this? Can we find another estimate, for example, another unbiased estimate that has a smaller variance? If that were the case, then we would choose that instead of the OLS solution. The answer to that is no. And according to something called the Gauss-Markoff theorem, this states that any other unbiased estimator of beta will have a larger variance than the OLS solution. So, what we mean here is assume that we have another estimate, beta tilde, that's also an unbiased estimator of beta. So beta tilde on average is going to be equal to beta. Then according to the Gauss-Markov theorem, the variance beta has always going to be as small or smaller than the variance of any given unbiased estimator beta tilde. So in this case, we say that beta hat is the best linear unbiased estimator, the BLUE, of beta. That means that among the class of all possible linear unbiased estimators, beta, then beta hats the one has the smallest variance. And we like to have small variance, because that means we're, we're close to the true value. So finding sum if epsilon is i.i.d., then the ordinary least square OLS estimate is optimal. So we have Y is equal to X beta plus epsilon as our model, and then we're going to have beta hat is X trans plus X inverse X trans plus Y. Now what happens if the variance of epsilon is non i.i.d., so the variance of epsilon is equal to say V times sigma squared, where V is not equal to the identity matrix. Well, in that case, Ordinary Least Square is no longer optimal. In this case, something called Generalized Least Squares or GLS is optimal. So, the GLS solution looks very similar to the OLS solution, but it's beta hat, X transpose V inverse X, inverse X transpose V inverse Y. So what we see is, you had, you had to include the V inverse there two places. Now, what will happen is that if V is equal to the identity matrix, then the, the V inverses will, will disappear from this equation and we'll get the OLS solution. So this is a generalization of the OLS solution. Now, for these reasons, we'll be working with this solution rather than the OLS solution, because it encapsulates the OLS solution. So in summary, we have that Y is equal to X beta plus epsilon, and we have the best linear unbiased estimator for this regardless of what the value of V is. Now, this is the best linear unbiased estimator, under the condition that we know what V is. Now, often times, of course, we don't know that or always we don't know that. And, and so we have to estimate V as well. So how would we go about doing that? Well, the first thing we have to do is, you know, we calculate the fitted value so Y hat is equal to X has beta hat, so that's our estimated signal. And then we, we, we look at the difference between the, the true signal and the estimated signal and that gives us the residuals. Now we can rewrite the residuals in the following way by changing Y hat to X beta hat and putting in the value of beta hat we get the following equation. Now this can just be written as R times Y. Where R is the so called residual inducing matrix. So we'll, we'll need these terms a little bit later on. We're little r is going to refer to the residuals. So that's the data minus the fitted values. And capital R is the residual inducing matrix. That's the matrix that we applied to the data to get the residuals. Even if we assume that epsilon's i.i.d., we still need to estimate the residual variance sigma squared. So, our estimate is going to be based on the residual. It's going to be r transpose r. So, that's going to be the sum of the squared residuals in matrix speak here. And divided by the traits of R, capital R, the residual and this thing matrix NV. So for our, OLS, this later term is just N minus p, so its the number of time points minus the number of columns in the design matrix. So this might be familiar if you have taken a regression class. However, estimating the parameters of V when V is not equal to the identity matrix is much more difficult. And we'll return to this problem a little bit later on when, after we've talked about different models that we could use for V. Okay, so that's the end of the module. And here we talked about how we could estimate the parameters in the GLM model. In the coming modules we'll talk about how we can refine the GLM model to fit our data better. So we'll talk about how we can choose what columns to, that's going to go into the design matrix and what our variance, covariance programs matrix should look like. Okay, I'll see you then. Bye.
(London Mathematical Society Student Texts 88) Derek F. Holt, Sarah Rees, Claas E. Röver-Groups, Languages and Automata-Cambridge University Press (2017)