Question

我需要最大限度地减少大量（1000s）不同输入的成本函数。显然，这可以通过循环scipy.optimize.minimize或任何其他最小化例程来实现。这是一个例子：

    var audioPlayer = AVAudioPlayer()
    let sound = URL(fileURLWithPath: Bundle.main.path(forResource: "sound", ofType: "mp3")!)

    try! AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayback)
    try! AVAudioSession.sharedInstance().setActive(true)

    try! audioPlayer = AVAudioPlayer(contentsOf: sound)
    audioPlayer.prepareToPlay()
    audioPlayer.play()

它找到import numpy as np import scipy as sp def cost(x, a, b): return np.sum((np.sum(a * x.reshape(a.shape), axis=1) - b)**2) a = np.random.randn(500, 40) b = np.array(np.arange(500)) x = [] for i in range(a.shape[0]): res = sp.optimize.minimize(cost, np.zeros(40), args=(a[None, i], b[None, i])) x.append(res.x)，使x[i, :]和cost的{{1}}最小化，但这非常慢。我想循环a[i, :]会导致相当大的开销。

部分解决方案是同时解决所有b[i]：

minimize

这甚至比循环慢。 x不知道res = sp.optimize.minimize(cost, np.zeros_like(a), args=(a, b))中的元素是按组无关的。因此，考虑到问题结构，它会计算完整的粗糙度，尽管块对角矩阵就足够了。这很慢并且溢出了我的计算机内存。

有没有办法通知minimize或其他优化函数有关问题结构，以便它可以在单个函数调用中解决多个独立优化？（类似于certain options supported by Matlab's fsolve。）

Answer 1

首先，解决方案：

结果scipy.optimize.least_squares支持通过设置jac_sparsity参数来利用jacobian的结构。

least_squares函数的工作方式与minimize略有不同，因此需要重写成本函数以返回残差：

def residuals(x, a, b):
    return np.sum(a * x.reshape(a.shape), axis=1) - b

jacobian具有块对角稀疏结构，由

给出

jacs = sp.sparse.block_diag([np.ones((1, 40), dtype=bool)]*500)

调用优化例程：

res = sp.optimize.least_squares(residuals, np.zeros(500*40),
                                jac_sparsity=jacs, args=(a, b))
x = res.x.reshape(500, 40)

但它真的更快吗？

%timeit opt1_loopy_min(a, b)        # 1 loop, best of 3: 2.43 s per loop
%timeit opt2_loopy_min_start(a, b)  # 1 loop, best of 3: 2.55 s per loop
%timeit opt3_loopy_lsq(a, b)        # 1 loop, best of 3: 13.7 s per loop
%timeit opt4_dense_lsq(a, b)        # ValueError: array is too big; ...
%timeit opt5_jacs_lsq(a, b)         # 1 loop, best of 3: 1.04 s per loop

结论：

原始解决方案（opt1）与重新使用起点（opt2）之间没有明显差异而没有排序。
循环least_squares（opt3）比循环minimize（opt1，opt2）慢得多。
问题太大而无法与least_squares一起运行，因为jacobian矩阵不适合内存。
在least_squares（opt5）中利用jacobian的稀疏结构似乎是最快的方法。

这是时间测试环境：

import numpy as np
import scipy as sp

def cost(x, a, b):
    return np.sum((np.sum(a * x.reshape(a.shape), axis=1) - b)**2)

def residuals(x, a, b):
    return np.sum(a * x.reshape(a.shape), axis=1) - b

a = np.random.randn(500, 40)
b = np.arange(500)

def opt1_loopy_min(a, b):
    x = []
    x0 = np.zeros(a.shape[1])
    for i in range(a.shape[0]):
        res = sp.optimize.minimize(cost, x0, args=(a[None, i], b[None, i]))
        x.append(res.x)
    return np.stack(x)

def opt2_loopy_min_start(a, b):
    x = []
    x0 = np.zeros(a.shape[1])
    for i in range(a.shape[0]):
        res = sp.optimize.minimize(cost, x0, args=(a[None, i], b[None, i]))
        x.append(res.x)
        x0 = res.x
    return np.stack(x)

def opt3_loopy_lsq(a, b):
    x = []
    x0 = np.zeros(a.shape[1])
    for i in range(a.shape[0]):
        res = sp.optimize.least_squares(residuals, x0, args=(a[None, i], b[None, i]))
        x.append(res.x)
    return x

def opt4_dense_lsq(a, b):
    res = sp.optimize.least_squares(residuals, np.zeros(a.size), args=(a, b))
    return res.x.reshape(a.shape)

def opt5_jacs_lsq(a, b):
    jacs = sp.sparse.block_diag([np.ones((1, a.shape[1]), dtype=bool)]*a.shape[0])
    res = sp.optimize.least_squares(residuals, np.zeros(a.size), jac_sparsity=jacs, args=(a, b))
    return res.x.reshape(a.shape)

Answer 2

我认为循环最小化会导致相当大的开销。

错误的猜测。最小化函数所需的时间使任何循环开销相形见绌。这个问题没有矢量化魔力。

通过使用更好的最小化起点可以节省一些时间。首先，对参数进行排序，使连续循环具有相似的参数。然后使用先前最小化的终点作为下一个的起点：

a = np.sort(np.random.randn(500, 40), axis=0)   # sorted parameters
b = np.arange(500)   # no need for np.array here, np.arange is already an ndarray

x0 = np.zeros(40)
for i in range(a.shape[0]):
    res = minimize(cost, x0, args=(a[None, i], b[None, i]))
    x.append(res.x)
    x0 = res.x

这样可以节省30-40％的执行时间。

要做的另一个小的优化是为结果x值预分配适当大小的ndarray，而不是使用list和append方法。在循环之前：x = np.zeros((500, 40));在循环中，x[i, :] = res.x。

解决scipy中的多个独立优化问题

2 个答案: