Cholesky decomposition
Cholesky decomposition
Symmetric positive definite matrices are a very special type of matrix that often
arise in practice. From a computational point of view, this class of matrix is very
attractive because it is possible to decompose a symmetic positive definite matrix
very efficiently into a single lower triangular matrix so that .
A matrix is positive definite if for any nonzero .
This statement by itself is not terribly intuitive, so lets look at also look at an
example of a matrix
If is symmetic positive definite (SPD) then
The first two equations show that the diagonal entries of must be positive, and
combining the last two equations imply , that is
that the matrix has much of its "mass" on the diagonal (note: this is not the same as
the matrix being diagonally dominant, where ). These two observations for our matrix also apply for a general
SPD matrix. One of the very nice consequences of this "weighty" diagonal
for SPD matrices is that it precludes the need for pivoting.
It can be shown that if is a SPD matrix, then the decomposition exists and
that has positive diagonal entries. Therefore, it is
straightforward to see that = , where . The decomposition is known as the cholesky decomposition and
can be efficiently constructed in flops. There are a number of algorithms to
construct this decomposition, and both the wikipedia
entry and Chapter 4.2 of the
Matrix Computations textbook by Golub and Van Loan gives a number of different varients.
Note that a decomposition can also be used to calculate a cholesky decomposition,
and this could be more efficient approach since (a) the SPD structure means that we can
neglect pivoting in the decomposition, and (b) the decomposition does not
requiring taking the square root of the diagonal elements.
Other Reading
- Golub, G. H. & Van Loan, C. F. Matrix Computations, 3rd Ed. (Johns Hopkins University Press, 1996). Chapter 4.2
Software
Problems
Sampling random fields
Imagine that we wanted to sample an array of values , for , where each
value is sampled from an independent normal distribution with standard deviation
This could be achieved, for example, by sampling from a normal distribution with unit
standard deviation, a function that typically exists in any computer language, then
multiplying by
where
Now imagine that instead of an independent normal distribution you wish to sample
from a multivariate normal distribution with some
covariance matrix
We can achive this in practice by using the Cholesky decomposition. A covariance
matrix is a symmetic positive semidefinite matrix (i.e. }, and
therefore can be decomposed into . We can then draw a sample from
by scaling an independently generated random vector
by
where each element of the vector is .
Write Python code to randomly sample an n-dimensional vector from
- an independent normal distribution with variance .
- a multivariate normal distribution using a covariance matrix . Try different values for the magnitute , and lenghtscale parameters and their effect on the sampled . Hint: while the expression for is guarrenteed to be positive definte for all values of and , numerical round-off can mean that the Cholesky decomposition can fail. To guarrentee a positive definite , try adding a small amount (e.g. 1e-5) to the diagonal of . This is equivilent to adding a very small amount of independent normal noise to .
Likelihood
Now imagine that we have a vector of measurements , and we assume that a
suitable model for these measurements is that they are generated from a zero-mean,
multivariate normal distribuion, i.e.
We assume that the covariance matrix is of the following form, with two parameters
.
We can write down the likelihood of the covariance parameters , given
a given dataset , by using the probability distribution function (PDF) for a
zero-mean multivariate normal distribution
Typically we work with the log of the likelihood for numerical reasons, which is
Generate a simulated dataset using your code for question (2) using
"true" parameters . Then calculate
the log-likelihood using the Cholesky decomposition to efficiently calculate the
log determinant and the inverse of the covariance matrix. Vary
and satisfy yourself that the maximum of the likelihood occurs at your "true"
parameters. In practice, when you don't know the true parameters, you could use an
optimisation algorithm to automatically determine the most likely model
parameters that give rise to your data.