Unconstrained nonlinear least squares

In this part we see how to solve problems of the form:

\[\min\limits_{\theta} \frac{1}{2}\|r(\theta)\|^2\]

For that we have to call the solve(nls::AbstractNLS, θ_init::AbstractVector, conf::Abstract_Solver_Conf).

One must define:

You can reproduce the results below using sandbox/example_Rosenbrock.jl

Problem definition

For this example we will use the classical Rosenbrock function:

\[(\theta_1,\theta_2) \mapsto (1-\theta_1)^2 + 100(\theta_2-\theta_1^2)^2\]

which can be viewed as a nonlinear least squares problem, with:

\[\frac{1}{2}\|r(\theta)\|^2\text{ where }r = \sqrt{2} \left( \begin{array}{c} 1-\theta_1 \\ 10(\theta_2-\theta_1^2) \end{array} \right)\]

The first task is to wrap the problem. The easiest way is to let the ForwardDiff.jl package computes the Jacobian. In that case you only have to provide the objective function. For that call the create_NLS_problem_using_ForwardDiff :

nls = create_NLS_problem_using_ForwardDiff(2 => 2) do θ
  sqrt(2)* [ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end

The returned nls is an instance of AbstractNLS.

Danger

Do not specify a type, like Float64

nls = create_NLS_problem_using_ForwardDiff(2 => 2) do θ
    sqrt(2)* Float64[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end

This would prevent the use of dual numbers to compute the Jacobian.

Note

An alternative, if you do not want to use the do...end syntax, would be:

Rosenbrock(θ::AbstractVector{T}) where T = sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]

nls = create_NLS_problem_using_ForwardDiff(Rosenbrock,2 => 2);
Note

If you do not want to use create_NLS_problem_using_ForwardDiff you can directly sub-type AbstractNLS by defining all the required method. This is described in direct sub-typing of AbstractNLS

Choose a solver

Algorithm parameters are defined by sub-typing Abstract_Solver_Conf. This structure is then used to identify the selected algorithm. For the moment there is only one implementation, the classical Levenberg-Marquardt method:

conf = LevenbergMarquardt_Conf()
LevenbergMarquardt_Conf(1000, 1.0e-8, 1.0e-8, 0.001)

We also need a starting point for the $\theta$ parameter vector. We can create a zero filled vector:

θ_init = zeros(parameter_size(nls))
2-element Vector{Float64}:
 0.0
 0.0

The solve() function

To solve the problem you simply have to call the solve() function. For unconstrained problems, this function has the following prototype

function solve(nls::AbstractNLS,
               θ_init::AbstractVector,
               conf::Abstract_Solver_Conf)::Abstract_Solver_Result

In our case this gives

result = solve(nls, θ_init, conf)
LevenbergMarquardt_Result{Float64}(true, 17, 3.0846066557944157e-19, [0.9999999994450625, 0.9999999988878777])

Using solver result

The solve() function returns a Abstract_Solver_Result sub-typed structure that contains algorithm result.

In peculiar you can check if the method has converged

@assert converged(result)

and get the optimal θ

θ_solution = solution(result)
2-element ReadOnlyArrays.ReadOnlyArray{Float64, 1, Vector{Float64}}:
 0.9999999994450625
 0.9999999988878777

Annex: direct sub-typing of AbstractNLS

To wrap the objective function, you can sub-type AbstractNLS. In that case, you have to define everything, including the function that computes the Jacobian.

More precisely, you have to define 4 methods:

  • parameter_size : returns the $θ$ parameter vector length
  • residue_size : returns the $r$ residue vector length
  • eval_r : computes the residue $r$ value
  • eval_r_J : computes the residue $r$ value and its Jacobian matrix wrt to $\theta$.

For the Rosenbrock function this gives:

struct Rosenbrock <: NLS_Solver.AbstractNLS
end

import NLS_Solver: parameter_size, residue_size, eval_r, eval_r_J

NLS_Solver.parameter_size(::Rosenbrock) = 2
NLS_Solver.residue_size(::Rosenbrock) = 2

function NLS_Solver.eval_r(nls::Rosenbrock,θ::AbstractVector{T}) where T
    @assert length(θ)==parameter_size(nls)

    sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
end

function NLS_Solver.eval_r_J(nls::Rosenbrock,θ::AbstractVector{T}) where T
    @assert length(θ)==parameter_size(nls)

    r = sqrt(2)* T[ 1-θ[1], 10*(θ[2]-θ[1]^2) ]
    J = sqrt(2)* T[ -1 0; -20*θ[1] 10]

    (r,J)
end

All the remaining parts are identical:

nls = Rosenbrock()

conf = LevenbergMarquardt_Conf()
θ_init = zeros(parameter_size(nls))
result = solve(nls, θ_init, conf)
LevenbergMarquardt_Result{Float64}(true, 17, 3.0846066557944157e-19, [0.9999999994450625, 0.9999999988878777])

We can check that we find the same solution:

θ_solution = solution(result)
2-element ReadOnlyArrays.ReadOnlyArray{Float64, 1, Vector{Float64}}:
 0.9999999994450625
 0.9999999988878777