Unconstrained nonlinear least squares
In this part we see how to solve problems of the form:
\[\min\limits_{\theta} \frac{1}{2}\|r(\theta)\|^2\]
For that we have to call the solve(nls::AbstractNLS, θ_init::AbstractVector, conf::Abstract_Solver_Conf)
.
One must define:
- an
AbstractNLS
instance, this is the object of the Problem definition section - an
Abstract_Solver_Conf
instance, this is the object of the Choose a solver section
You can reproduce the results below using sandbox/example_Rosenbrock.jl
Problem definition
For this example we will use the classical Rosenbrock function:
\[(\theta_1,\theta_2) \mapsto (1-\theta_1)^2 + 100(\theta_2-\theta_1^2)^2\]
which can be viewed as a nonlinear least squares problem, with:
\[\frac{1}{2}\|r(\theta)\|^2\text{ where }r = \sqrt{2} \left( \begin{array}{c} 1-\theta_1 \\ 10(\theta_2-\theta_1^2) \end{array} \right)\]
The first task is to wrap the problem. The easiest way is to let the ForwardDiff.jl package computes the Jacobian. In that case you only have to provide the objective function. For that call the create_NLS_problem_using_ForwardDiff
:
nls = create_NLS_problem_using_ForwardDiff(2 => 2) do θ
sqrt(2)* [ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end
The returned nls
is an instance of AbstractNLS
.
Do not specify a type, like Float64
nls = create_NLS_problem_using_ForwardDiff(2 => 2) do θ
sqrt(2)* Float64[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end
This would prevent the use of dual numbers to compute the Jacobian.
An alternative, if you do not want to use the do...end
syntax, would be:
Rosenbrock(θ::AbstractVector{T}) where T = sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
nls = create_NLS_problem_using_ForwardDiff(Rosenbrock,2 => 2);
If you do not want to use create_NLS_problem_using_ForwardDiff
you can directly sub-type AbstractNLS
by defining all the required method. This is described in direct sub-typing of AbstractNLS
Choose a solver
Algorithm parameters are defined by sub-typing Abstract_Solver_Conf
. This structure is then used to identify the selected algorithm. For the moment there is only one implementation, the classical Levenberg-Marquardt method:
conf = LevenbergMarquardt_Conf()
LevenbergMarquardt_Conf(1000, 1.0e-8, 1.0e-8, 0.001)
We also need a starting point for the $\theta$ parameter vector. We can create a zero filled vector:
θ_init = zeros(parameter_size(nls))
2-element Vector{Float64}:
0.0
0.0
The solve()
function
To solve the problem you simply have to call the solve()
function. For unconstrained problems, this function has the following prototype
function solve(nls::AbstractNLS,
θ_init::AbstractVector,
conf::Abstract_Solver_Conf)::Abstract_Solver_Result
In our case this gives
result = solve(nls, θ_init, conf)
LevenbergMarquardt_Result{Float64}(true, 17, 3.0846066557944157e-19, [0.9999999994450625, 0.9999999988878777])
Using solver result
The solve()
function returns a Abstract_Solver_Result
sub-typed structure that contains algorithm result.
In peculiar you can check if the method has converged
@assert converged(result)
and get the optimal θ
θ_solution = solution(result)
2-element ReadOnlyArrays.ReadOnlyArray{Float64, 1, Vector{Float64}}:
0.9999999994450625
0.9999999988878777
Annex: direct sub-typing of AbstractNLS
To wrap the objective function, you can sub-type AbstractNLS
. In that case, you have to define everything, including the function that computes the Jacobian.
More precisely, you have to define 4 methods:
parameter_size
: returns the $θ$ parameter vector lengthresidue_size
: returns the $r$ residue vector lengtheval_r
: computes the residue $r$ valueeval_r_J
: computes the residue $r$ value and its Jacobian matrix wrt to $\theta$.
For the Rosenbrock function this gives:
struct Rosenbrock <: NLS_Solver.AbstractNLS
end
import NLS_Solver: parameter_size, residue_size, eval_r, eval_r_J
NLS_Solver.parameter_size(::Rosenbrock) = 2
NLS_Solver.residue_size(::Rosenbrock) = 2
function NLS_Solver.eval_r(nls::Rosenbrock,θ::AbstractVector{T}) where T
@assert length(θ)==parameter_size(nls)
sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end
function NLS_Solver.eval_r_J(nls::Rosenbrock,θ::AbstractVector{T}) where T
@assert length(θ)==parameter_size(nls)
r = sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
J = sqrt(2)* T[ -1 0; -20*θ[1] 10]
(r,J)
end
All the remaining parts are identical:
nls = Rosenbrock()
conf = LevenbergMarquardt_Conf()
θ_init = zeros(parameter_size(nls))
result = solve(nls, θ_init, conf)
LevenbergMarquardt_Result{Float64}(true, 17, 3.0846066557944157e-19, [0.9999999994450625, 0.9999999988878777])
We can check that we find the same solution:
θ_solution = solution(result)
2-element ReadOnlyArrays.ReadOnlyArray{Float64, 1, Vector{Float64}}:
0.9999999994450625
0.9999999988878777