QR decomposition
QR decomposition
QR decomposition
The least-squares problem
One of the most important application of the decomposition is the least squares
solution of a set of overdetermined equations. That is a set of linear equations
with unknowns, with . The least squares problem to be solved is the
mimimisation of , where
is the standard 2-norm, and where with and . In this case, the problem will often have no solution, and
thus it is nessessary to consider and as approximatelly equal, and to
minimise the distance between them by minimising the loss function .
To solve this least squares problem, we need to consider the subspace of all vectors in
that are formed from linear combinations of the columns of . This is
known as the column space of the , and is denoted as . Given that any
linear combination of the columns of will lie in this space, we can say that
will also lie in for any .
Now consider a projection of into the column space of to give a new vector
(i.e. is the closest point in Col to ), see the diagram
below. Because is in the column space of , we know that there is another
vector that also lies in Col and satisfies
Since is the closest point to in the column space of , we can therefore
say that is the least-squares solution.
We can show that the vector is orthogonal to Col and
therefore also orthogonal to each column in , so we have for
each column of . Putting these equations together we can write
or rearranged slightly, we can find the least-sqaures solution via the
solution of the equation
The decomposition divides into an orthogonal matrix , and an upper
triangular matrix . Most importantly for the least-squares problem, the matrix is
also an orthonormal basis for Col and therefore .
Given this decomposition, it can be shown that the least squares solution of
is given by
To prove this, we let and make the following substitutions
Therefore , which proves that is the least-squares
solution for
Finally, we note that the inverse should not be calculated directly, but
instead should be found by solving
Constructing the QR decomposition
decomposisions are normally computed via Householder reflections, Givens rotations
or the Gram-Schmidt process. For a brief summary of the first two methods, it is useful
to consider a simple reflection or rotation of a 2d vector. For example,
the matrix
is a rotation matrix that when applied to a vector will result in , where
is rotated counterclockwise through the angle . is also orthogonal
since .
Similarly, a reflection matrix can be constructed as
which when applied to a vector will result in , where is reflected
across the line defined by .
Rotations and reflections are often useful because they can be selected in order to
introduce zeros to the vector they are applied to. Given an matrix , a
series of Householder reflections can be applied to reduce to an upper
triangular matrix
By setting , we can show that , and that is an
orthogonal matrix which is also an orthonormal basis for the column space of .
Similarly, a Givens rotation can be used to zero a single component of , so that a
a series of rotations can be used to contruct the upper triangular matrix
so that , and . For both the Householder and Givens
methods, it is often useful to not construct the full matrix but to keep
factored as a implicit product of either or . Fast
algorithms exist to calculate the produce of these factored forms to another vector.
The final method to contruct a decomposition is using the Gram-Schmidt process,
which is a process for contructing an orthogonal or orthonormal basis for a given
subspace defined by the span of the set of vectors . If these
vectors are the columns of the matrix , then the Gram-Schmidt process
can be used to directly contruct the orthonormal basis of the column space of given
by , and that where is an upper triangular matrix. The matrix can be
calculated using . Note that the classical Gram-Schmidt exhibits poor
numerical qualities, therefore a modified version of the algorithm exists, which is
described in the Golub and Van Loan Matrix Computations textbook listed below.
In terms of computational work, the Householder method takes flops to
compute in factored form, and another to get the full matrix ,
whereas the Gram-Schmidt method is more efficient at flops. However, Householder
is normally prefered in practice as even with the modified algorithm the numerical
properies of the Gram-Schmidt are still poor in comparison with both Householder and
Givens (i.e. the final orthogonality of is not ideal), so is only useful when the
columns of are already fairly independent. Using Givens rotations the matrix can
be found in , or the factorised form of the decomposition can be found
in the same amount of time. The full matrix is not normally calculated via Givens
rotations. Using Givens rotations is most useful when there are only few non-zeros in
, and is more easily parallised than Householder.
Other Reading
The discussion in this section relied on concepts such as orthogonal and orthonormal
vector pairs, vector spaces and subspaces and basis vectors. It is well worth
investigating these topics further in:
- Linear algebra and its applications by David C. Lay. Chapers 4 & 6.
Additional reading on the decomposition can be found at:
- Linear algebra and its applications by David C. Lay. Chaper 6.4
- Golub, G. H. & Van Loan, C. F. Matrix Computations, 3rd Ed. (Johns Hopkins University Press, 1996). Chapter 5.2
Software
Problems
Model fitting
For this exercises we will be using some data on Oxford's weather which is hosted by
Saad Jbabdi from the Wellcome Centre for
Integrative NeuroImaging (FMRIB), which can be obtained
here.
We wish to fit a quadratic model of the form to the hours of
sunlight observed in Oxford (7th column in
OxfordWeather.txt
) versus the month (2nd
column). The dataset in question has data points, so our model gives us a set of
equations for 3 unknowns , , and that are overdetermined, that is, for
each data point for we have:Use a decomposition to find the least-squares solution to these equations (you can
check it using
np.linalg.lstsq
if you like), and therefore fit the model to the data.
Plot the model and the data side by side to qualitatively evaluate the fit.