Unconstrained nonlinear least squares
In this part we see how to solve problems of the form:
\[\min\limits_{\theta} \frac{1}{2}\|r(\theta)\|^2\]
For that we have to call the
solve(nls::AbstractNLS, θ_init::AbstractVector, conf::Abstract_Solver_Conf)
function.
Where:
- Problem definition section shows how to create an
AbstractNLS
instance, - Choose a solver section shows how to create an
Abstract_Solver_Conf
instance.
You can reproduce the results below using the sandbox/example_Rosenbrock.jl
file.
Problem definition
This example uses the classical Rosenbrock function:
\[(\theta_1,\theta_2) \mapsto (1-\theta_1)^2 + 100(\theta_2-\theta_1^2)^2\]
which can be viewed as a nonlinear least squares problem, with:
\[\frac{1}{2}\|r(\theta)\|^2\text{ where }r = \sqrt{2} \left( \begin{array}{c} 1-\theta_1 \\ 10(\theta_2-\theta_1^2) \end{array} \right)\]
The first task is to wrap this problem. The easiest way is to provide the residue function $r$ and let the ForwardDiff.jl package computes its Jacobian. This is the role of the create_NLS_problem_using_ForwardDiff
wrapper:
nls = create_NLS_problem_using_ForwardDiff(2 => 2) do θ
sqrt(2)* [ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end
The returned nls
is an instance of AbstractNLS
.
Do not specify a type, like Float64
nls = create_NLS_problem_using_ForwardDiff(2 => 2) do θ
sqrt(2)* Float64[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end
This would prevent the use of dual numbers to compute the Jacobian.
An alternative, if you do not want to use the do...end
syntax, would be:
Rosenbrock(θ::AbstractVector{T}) where T = sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
nls = create_NLS_problem_using_ForwardDiff(Rosenbrock,2 => 2);
If you do not want to use create_NLS_problem_using_ForwardDiff
you can directly sub-type AbstractNLS
by defining all the required methods. This is described in direct sub-typing of AbstractNLS
Choose a solver
The algorithm and its parameters are defined by sub-typing Abstract_Solver_Conf
. For the moment there is only one implementation, the classical Levenberg-Marquardt method:
conf = LevenbergMarquardt_Conf()
You can have a look at LevenbergMarquardt_Conf
for further details.
We also need a starting point for the $\theta$ parameter vector, here we create a zero filled vector:
θ_init = zeros(parameter_size(nls))
2-element Vector{Float64}:
0.0
0.0
The solve()
function
To solve the problem one must call the solve()
function. For unconstrained problems, this function has the following prototype
function solve(nls::AbstractNLS,
θ_init::AbstractVector,
conf::Abstract_Solver_Conf)::Abstract_Solver_Result
In our case this gives
result = solve(nls, θ_init, conf)
Using solver result
The solve()
function returns a Abstract_Solver_Result
sub-typed structure that contains algorithm result.
In peculiar you can check if the method has converged:
@assert converged(result)
and get the founded solution:
θ_solution = solution(result)
2-element ReadOnlyArrays.ReadOnlyArray{Float64, 1, Vector{Float64}}:
0.9999999994450625
0.9999999988878777
Annex: direct sub-typing of AbstractNLS
To define a nonlinear least squares problem, you can sub-type AbstractNLS
. In that case, you have to define everything, including the function that computes the residue Jacobian.
More precisely, you have to define 4 methods:
parameter_size
: returns the $θ$ parameter vector lengthresidue_size
: returns the $r$ residue vector lengtheval_r
: computes the residue $r$ valueeval_r_J
: computes the residue $r$ value and its Jacobian matrix wrt to $\theta$.
For the Rosenbrock function this gives:
struct Rosenbrock <: NLS_Solver.AbstractNLS
end
import NLS_Solver: parameter_size, residue_size, eval_r, eval_r_J
NLS_Solver.parameter_size(::Rosenbrock) = 2
NLS_Solver.residue_size(::Rosenbrock) = 2
function NLS_Solver.eval_r(nls::Rosenbrock,θ::AbstractVector{T}) where T
@assert length(θ)==parameter_size(nls)
sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end
function NLS_Solver.eval_r_J(nls::Rosenbrock,θ::AbstractVector{T}) where T
@assert length(θ)==parameter_size(nls)
r = sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
J = sqrt(2)* T[ -1 0; -20*θ[1] 10]
(r,J)
end
All the remaining parts are identical to the previously seen steps:
nls = Rosenbrock()
conf = LevenbergMarquardt_Conf()
θ_init = zeros(parameter_size(nls))
result = solve(nls, θ_init, conf)
The solution is hopefully identical to the one we found before:
θ_solution = solution(result)
2-element ReadOnlyArrays.ReadOnlyArray{Float64, 1, Vector{Float64}}:
0.9999999994450625
0.9999999988878777