Derivative-free methods
The line search and trust region methods introduced in the previous lesson all required
that the user be able to calculate the gradient of the function . However, in
many cases the gradient is either not available or too error-prone to be of use. For
example, the function might be only available as a compiled executable or the result
of a physical experiment. The model might be stochastic, or the gradient evaluation
might be noisy due to numerical innacuracies, or of sufficiently complexity that the
gradient is either unknown or too expensive to compute.
Here we describe two of the most common methods for derivative-free optimisation, using
a finite difference approximation to approximate the derivative, and the Nelder-Mead
algorithm, which is a Simplex search method.
However, there are a large number of derivative-free methods, ranging from the classical
Direct Search methods like Pattern search, Simplex search, Rosenbrock' or Powell's methods. Then there are emulator or model -based methods that build up a model of the function and minimise that using a gradient-based method, a powerful example of this class of methods is Bayesian Optimisation. Many global optimsiation algorithms are derivative-free, including randomised algorithms such as Simulated Annealing, and evolution-based strategies such as the popular Covariance matrix adaptation evolution strategy (CMA-ES), or swarm algorithms inspired from bees/ants like Particle Swarm Optimisation.
Direct Search methods like Pattern search, Simplex search, Rosenbrock' or Powell's methods. Then there are emulator or model -based methods that build up a model of the function and minimise that using a gradient-based method, a powerful example of this class of methods is Bayesian Optimisation. Many global optimsiation algorithms are derivative-free, including randomised algorithms such as Simulated Annealing, and evolution-based strategies such as the popular Covariance matrix adaptation evolution strategy (CMA-ES), or swarm algorithms inspired from bees/ants like Particle Swarm Optimisation.
Finite difference
The simplest way of converting a gradient-based optimisation algorithm to a derivative
free one is to approximate the gradient of the function using finite differences.
The Finite Difference (FD) method is based on taking a Taylor series expansion of either
and (and others) for a small parameter about . Consider a
smooth function then its Taylor expansion is
From this, we can compute three different schemes (approximations) to :
Forward difference:
Backward difference:
Centered difference:
Finite difference approximations are easily computed, but suffer from innacuracies which
can cause optimisation algorithms to fail or perform poorely. As well as the error in
the FD approximation itself (e.g. for centered difference), the function
evaluation itself might have some numerical or stochastic "noise". If this noise
dominates over the (small) step size , then it is entirely probable that the
calculated steepest descent will not be a direction of descent for
.
Software
It is very common that optimisation libraries provide a finite difference approximation
to the Jacobian if it is not supplied, as is done for the gradient-based
methods in
scipy.optimize
.More dedicated libraries can give superior approximations to the gradient, like the
numdifftools
package. This
library provides higher order FD approximations and Richardson extrapolation to
evaluate the limit of , and can calculate Jacobians and Hessians of
user-supplied functions.Problems
Comparing optimisation methods
Use the following methods from
scipy.optimize
to minimize
the 2D Rosenbrock
function:- Nelder-Mead Simplex
- Conjugate Gradient
- BFGS Quasi-Newton
- Newton-CG
- SHG Global Optimisation
Either use as the starting point, or experiment with your own.