我正在使用Optim.jl库来使用BFGS算法最小化Julia中的函数。今天,我已经向question询问了同一个图书馆,但为了避免混淆,我决定将其分成两部分。
我还希望在优化后得到负逆Hessian的估计值,以便进一步计算。
在Optim库的GitHub网站上,我找到了以下工作示例:
using Optim
rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result = optimize(rosenbrock, zeros(2), BFGS())
如何在结果优化后获得负逆Hessian?在它中是否有任何字段可以识别Hessian,Inverse Hessian或负Hessian;
修改
感谢您的评论。你认为编辑“optimize.jl”会更有效率,这样函数也会返回逆Hessian吗?请参阅下面的工作示例 - 编辑已在第226行引入:
if state.method_string == "BFGS"
return MultivariateOptimizationResults(state.method_string,
initial_x,
f_increased ? state.x_previous : state.x,
f_increased ? state.f_x_previous : state.f_x,
iteration,
iteration == options.iterations,
x_converged,
options.x_tol,
f_converged,
options.f_tol,
g_converged,
options.g_tol,
f_increased,
tr,
state.f_calls,
state.g_calls,
state.h_calls), state.invH
else
return MultivariateOptimizationResults(state.method_string,
initial_x,
f_increased ? state.x_previous : state.x,
f_increased ? state.f_x_previous : state.f_x,
iteration,
iteration == options.iterations,
x_converged,
options.x_tol,
f_converged,
options.f_tol,
g_converged,
options.g_tol,
f_increased,
tr,
state.f_calls,
state.g_calls,
state.h_calls)
end
或者只是:
return MultivariateOptimizationResults(state.method_string,
initial_x,
f_increased ? state.x_previous : state.x,
f_increased ? state.f_x_previous : state.f_x,
iteration,
iteration == options.iterations,
x_converged,
options.x_tol,
f_converged,
options.f_tol,
g_converged,
options.g_tol,
f_increased,
tr,
state.f_calls,
state.g_calls,
state.h_calls), state
在优化后完全访问“状态”。
编辑2
由于此更改将在新版本的Optim.jl库中引入,因此无需继续此讨论。目前, extended_trace 和 after_while!技巧都有效。就个人而言,我更喜欢后者,所以我将结束讨论,给予@Dan Getz正确答案。
答案 0 :(得分:5)
我知道一种方法,但是它是否值得,而不是自己估计逆Hessian,我不确定。如果您传递Optim.Options(store_trace=true, extended_trace=true)
,则可以获得包含最后~invH的优化路径的记录。例如,在
result = optimize(rosenbrock, zeros(2), BFGS(),
Optim.Options(store_trace=true, extended_trace=true));
我们可以得到
julia> result.trace[end]
16 5.471433e-17 2.333740e-09
* Current step size: 1.0091356566200325
* g(x): [2.33374e-9,-1.22984e-9]
* ~inv(H): [0.498092 0.996422; 0.996422 1.9983]
* x: [1.0,1.0]
julia> result.trace[end].metadata["~inv(H)"]
2×2 Array{Float64,2}:
0.498092 0.996422
0.996422 1.9983
但至少有两件事我不喜欢这种方法:
首先,打开extended_trace=true
似乎强制show_trace=true
- 您会注意到我没有显示计算的输出!感觉像个臭虫。通过将show_every
设置为大的,或通过完全重定向stdout来避免这种情况,可以减轻这种情况(虽然不会删除)。
第二个问题是我们应该可以在不存储整个路径的情况下访问最后一个状态,但这实际上是否是一个问题取决于问题的大小。
答案 1 :(得分:5)
另一个不太理想的方法是连接到内部Optim函数after_while!
,该函数当前无效并使用它来从上一个状态中提取信息。
在Julia代码中,这看起来像:
julia> using Optim
julia> rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock (generic function with 1 method)
julia> Optim.after_while!{T}(d, state::Optim.BFGSState{T}, method::BFGS, options)
= global invH = state.invH
julia> result = optimize(rosenbrock, zeros(2), BFGS())
Results of Optimization Algorithm
* Algorithm: BFGS
* Starting Point: [0.0,0.0]
* Minimizer: [0.9999999926033423,0.9999999852005353]
* Minimum: 5.471433e-17
* Iterations: 16
* Convergence: true
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-32: false
* |g(x)| < 1.0e-08: true
* f(x) > f(x'): false
* Reached Maximum Number of Iterations: false
* Objective Function Calls: 69
* Gradient Calls: 69
julia> invH
2×2 Array{Float64,2}:
0.498092 0.996422
0.996422 1.9983
这对于使用全局变量并依赖于在运行/编译after_while!
之前定义optimize
没有吸引力(但可能在v0.6中已经解决了)。
正如@DSM在他的回答中所指出的,希望访问最后一个优化器状态是很自然的。如果追踪不是答案,也许这就是。
答案 2 :(得分:1)
目前在Optim.jl中执行此操作的最简单方法是执行以下操作。
首先,加载Optim和OptimTestProblems(以便有一个例子来解决)
julia> using Optim, OptimTestProblems
julia> prob = OptimTestProblems.UnconstrainedProblems.examples["Himmelblau"]
OptimTestProblems.MultivariateProblems.OptimizationProblem{Void,Void,Float64,String,Void}("Himmelblau", OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau, OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau_gradient!, nothing, OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau_hessian!, nothing, [2.0, 2.0], [3.0, 2.0], 0.0, true, true, nothing)
然后指定所有部分optimize
需要并按正确顺序输入:
julia> x0 = prob.initial_x
2-element Array{Float64,1}:
2.0
2.0
julia> obj = OnceDifferentiable(prob.f, prob.g!, x0)
NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1},Val{false}}(OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau, OptimTestProblems.MultivariateProblems.UnconstrainedProblems.himmelblau_gradient!, NLSolversBase.fg!, 0.0, [NaN, NaN], [NaN, NaN], [NaN, NaN], [0], [0])
julia> m = BFGS()
Optim.BFGS{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64},Optim.##67#69}(LineSearches.InitialStatic{Float64}
alpha: Float64 1.0
scaled: Bool false
, LineSearches.HagerZhang{Float64}
delta: Float64 0.1
sigma: Float64 0.9
alphamax: Float64 Inf
rho: Float64 5.0
epsilon: Float64 1.0e-6
gamma: Float64 0.66
linesearchmax: Int64 50
psi3: Float64 0.1
display: Int64 0
, Optim.#67, Optim.Flat())
julia> options = Optim.Options()
Optim.Options{Float64,Void}(1.0e-32, 1.0e-32, 1.0e-8, 0, 0, 0, false, 0, 1000, false, false, false, 1, nothing, NaN)
julia> bfgsstate = Optim.initial_state(m, options, obj, x0)
Optim.BFGSState{Array{Float64,1},Array{Float64,2},Float64,Array{Float64,1}}([2.0, 2.0], [6.91751e-310, 6.9175e-310], [-42.0, -18.0], NaN, [6.9175e-310, 0.0], [6.91751e-310, 0.0], [6.91751e-310, 0.0], [1.0 0.0; 0.0 1.0], [6.91751e-310, 0.0], NaN, [6.9175e-310, 6.91751e-310], 1.0, false, LineSearches.LineSearchResults{Float64}(Float64[], Float64[], Float64[], 0))
julia> res = optimize(obj, x0, m, options, bfgsstate)
Results of Optimization Algorithm
* Algorithm: BFGS
* Starting Point: [2.0,2.0]
* Minimizer: [2.9999999999998894,2.000000000000162]
* Minimum: 5.406316e-25
* Iterations: 7
* Convergence: true
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 5.81e-09
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 2.93e+09 |f(x)|
* |g(x)| ≤ 1.0e-08: true
|g(x)| = 4.95e-12
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 42
* Gradient Calls: 42
然后我们可以从optimize
中变异的状态访问逆Hessian。
julia> bfgsstate.invH
2×2 Array{Float64,2}:
0.0160654 -0.00945561
-0.00945561 0.034967
并将其与通过计算实际Hessian的倒数得到的逆Hessian进行比较。
julia> H=similar(bfgsstate.invH)
2×2 Array{Float64,2}:
6.91751e-310 6.91751e-310
6.91751e-310 6.91751e-310
julia> prob.h!(H, Optim.minimizer(res))
34.00000000000832
julia> H
2×2 Array{Float64,2}:
74.0 20.0
20.0 34.0
julia> inv(H)
2×2 Array{Float64,2}:
0.0160681 -0.0094518
-0.0094518 0.0349716
这类似于在BFGS运行的最后一步获得的那个。