Trust Region Methods
Saddle points
Saddle point pose a particular challenge in non-linear optimisation, particularly in
higher dimensions. The plots below show two examples of saddle points in two dimensions.
Like local minima and maxima, these are stationary points where the gradient of the
function is zero , but where the value of the function rises along certain
directions and reduces along others (left plot). An alternative type of saddle point
arises when the hessian is singular, and are characterised by a plateau around the
stationary point, like the monkey saddle
depicted in the plot to the right.
Near the location of the critical point, the function can be restated using the quadratic form like so (see also Pascanu, Dauphin and Ganguli 2014):
where is the th eigenvalue of the Hessian, and is the motion of along the th eigenvector of
the Hessian. If the th eigenvalue is negative/positive then along the th
eigenvector the function will achieve a maximum/minimum at .
Gradient descent algorithms will move away or towards with a
step given by . So for negative eigenvalues the
motion will be towards lower values of away from . For
positive eigenvalues the motion will be towards lower values of towards
. The problem here is the size of the step, which
is very small for small values of .
Newton methods rescale the step size by so that it becomes
. For negative eigenvalues, this has the undesirable
characteristic that these methods move towards increasing values of (i.e.
towards the critical point) along corresponding eigenvectors. Since for
positive eigenvalues it is also moving towards the critical point, this means
that saddle points act as attractors for these types of methods.
Trust region methods restate the optimisation problem as a sequence of
optimisations of a second order approximation to
in a local trust-region surrounding the current point . The exact
solution to each of these subproblems can be shown to be
, where the value of is related to
the size of the trust region. In comparison with the previous methods above,
this is equivilent to moving with a step given by
. As long as is chosen to be larger
than the most negative eigenvalue then the direction of each step is now always
towards more negative values of . As long as is small
compared with then we avoid the small step sizes associated with
gradient descent.
Trust region methods
Like many line search methods, trust region methods also use the second order Taylor
expansion of around
where , is an approximation to the hessian matrix or the hessian itself . Trust region
methods aim to find the that minimises in a local trust region around the current point , where is the trust region radius.
Solving the minimisation given above is normally done approximately, with different
trust region methods varying how the approximation is achieved. Choosing the
trust-region radius is fundamental to this class of methods, and is done by comparing
the actual to the predicted reduction in the function value
Since is always positive, if is negative then the actual
function value is increasing, the step is rejected and the trust region radius
is decreased in order to improve the approximate model . If is
positive but much smaller than one then we do not alter . If is close
to or greater than 1 we can be confident in our model and thus increase . The
general algorithm for a trust region method (reproduced from the text by Nocedal and
Wright cited below) is:
Trust region algorithm
Given , , , and :
for
Obtain by (approximately) minimising where > > if> else >> if and>> else >>> > if >>
else >>
end for.
Solving the trust region subproblem
We will describe two algorithms for minimising , the Cauchy point and the
dogleg methods. The Cauchy point first solves a linear version of defined as
Subsequently, is used to find the scalar such that
Finally, the Cauchy point is given as .
The solution to this problem can be shown to be
where
The second method we describe is the dogleg method, which is applicable when is
a positive definite matrix. If the original hessian is positive definite then this
method is directly applicable, or one of the quasi-Newton positive definite
approximation to the hessian could also be used. The dogleg method is derived by
considering the path of the that minimises with increasing ,
which forms a curved path in parameter space. The method approximates this path with two
straight line segments. The first segment follows the steepest descent direction and is
given by
The second step is along the path between and . In the
case where is inside the trust region then
can be used without modification. Otherwise the point of intersection with the
trust-region radius must be calculated, which can be done by solving the following
quadratic equation
with the second segment being defined by
Problems
The Dog Leg
Let . At draw the contour lines of the quadratic model
assuming that is the Hessian of .
Draw the family of solutions of sothat as the trust region radius varies from to .
Repeat this at .
Dogleg method
Write a program that implements the dogleg method. Choose to be the exact
Hessian. Apply it to minimise the function in (1) from the same two starting
points. If you wish, experiment with the update rule for the trust region by
changing the constants in the trust region algorithm given above, or by designing
your own rules.