基准

Question

引入Tensorflow 2.0后，已删除scipy接口（tf.contrib.opt.ScipyOptimizerInterface）。但是，我仍然想使用scipy优化器 scipy.optimize.minimize（method ='L-BFGS-B'）来训练神经网络（ keras模型顺序 ）。为了使优化器正常工作，它需要输入 fun（x0）作为函数，其中 x0 是形状（n，）的数组。因此，第一步将是“加权”权重矩阵以获得具有所需形状的向量。为此，我修改了https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/提供的代码。这提供了一个函数工厂，用于创建这样的函数 fun（x0）。但是，该代码似乎不起作用，并且损失函数不会减少。如果有人可以帮助我解决这个问题，我将非常感激。

这是我正在使用的一段代码：

func = function_factory(model, loss_function, x_u_train, u_train)

# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)

# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')


def loss_function(x_u_train, u_train, network):
    u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
    loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
    return tf.cast(loss_value, dtype=tf.float32)


def function_factory(model, loss_f, x_u_train, u_train):
    """A factory to create a function required by tfp.optimizer.lbfgs_minimize.

    Args:
        model [in]: an instance of `tf.keras.Model` or its subclasses.
        loss [in]: a function with signature loss_value = loss(pred_y, true_y).
        train_x [in]: the input part of training data.
        train_y [in]: the output part of training data.

    Returns:
        A function that has a signature of:
            loss_value, gradients = f(model_parameters).
    """

    # obtain the shapes of all trainable parameters in the model
    shapes = tf.shape_n(model.trainable_variables)
    n_tensors = len(shapes)

    # we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
    # prepare required information first
    count = 0
    idx = [] # stitch indices
    part = [] # partition indices

    for i, shape in enumerate(shapes):
        n = np.product(shape)
        idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
        part.extend([i]*n)
        count += n

    part = tf.constant(part)


    def assign_new_model_parameters(params_1d):
        """A function updating the model's parameters with a 1D tf.Tensor.

        Args:
            params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
        """

        params = tf.dynamic_partition(params_1d, part, n_tensors)
        for i, (shape, param) in enumerate(zip(shapes, params)):

            model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))

    # now create a function that will be returned by this factory

    def f(params_1d):
        """
        This function is created by function_factory.
        Args:
            params_1d [in]: a 1D tf.Tensor.

        Returns:
            A scalar loss.
        """

        # update the parameters in the model
        assign_new_model_parameters(params_1d)
        # calculate the loss
        loss_value = loss_f(x_u_train, u_train, model)

        # print out iteration & loss
        f.iter.assign_add(1)
        tf.print("Iter:", f.iter, "loss:", loss_value)

        return loss_value

    # store these information as members so we can use them outside the scope
    f.iter = tf.Variable(0)
    f.idx = idx
    f.part = part
    f.shapes = shapes
    f.assign_new_model_parameters = assign_new_model_parameters

    return f

模型是对象tf.keras.Sequential。

在此先感谢您的帮助！

Answer 1

从tf1切换到tf2，我遇到了同样的问题，经过一点点实验，我发现下面的解决方案显示了如何在以tf.function装饰的函数和scipy优化器之间建立接口。与该问题相比，重要的变化是：

如Ives scipy的lbfgs所述需要获取函数值和渐变，因此您需要提供一个同时提供两者的函数，然后设置minSdkVersion

我在下面提供一个示例说明如何解决玩具问题。

jac=True

显示

import tensorflow as tf
import numpy as np
import scipy.optimize as sopt

def model(x):
    return tf.reduce_sum(tf.square(x-tf.constant(2, dtype=tf.float32)))

@tf.function
def val_and_grad(x):
    with tf.GradientTape() as tape:
        tape.watch(x)
        loss = model(x)
    grad = tape.gradient(loss, x)
    return loss, grad

def func(x):
    return [vv.numpy().astype(np.float64)  for vv in val_and_grad(tf.constant(x, dtype=tf.float32))]

resdd= sopt.minimize(fun=func, x0=np.ones(5),
                                      jac=True, method='L-BFGS-B')

print("info:\n",resdd)

基准

用于比较速度我用 lbfgs优化器，用于样式转换问题（有关网络，请参见here）。请注意，对于此问题，网络参数是固定的，输入信号是自适应的。由于优化的参数（输入信号）是一维的，因此不需要功能工厂。

我比较了四种实现方式

TF1.12：带有ScipyOptimizerInterface的TF1
TF2.0（E）：上面的方法不使用tf.function装饰器
TF2.0（G）：以上使用tf.function装饰器的方法
TF2.0 / TFP：使用来自的lbfgs最小化器 tensorflow_probability

为进行比较，优化在300次迭代后停止（通常为收敛，问题需要进行3000次迭代）

结果

info:
       fun: 7.105427357601002e-14
 hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
      jac: array([-2.38418579e-07, -2.38418579e-07, -2.38418579e-07, -2.38418579e-07,
       -2.38418579e-07])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 3
      nit: 2
   status: 0
  success: True
        x: array([1.99999988, 1.99999988, 1.99999988, 1.99999988, 1.99999988])

TF2.0急切模式（TF2.0（E））可以正常工作，但比TF1.12基线版本慢20％。具有tf.function的TF2.0（G）可以正常工作，并且比TF1.12快一点，这是一个好消息。

使用scipy的lbfgs，来自tensorflow_probability（TF2.0 / TFP）的优化器比TF2.0（G）稍快，但未实现相同的错误减少。实际上，随着时间的流逝，损失的减少并不是单调的，这似乎是一个不好的信号。比较lbfgs的两种实现（scipy和tensorflow_probability = TFP），很明显，scipy中的Fortran代码要复杂得多。因此，要么简化TFP中的算法，要么损害TFP在float32中执行所有计算的事实，就可能成为问题。

Answer 2

这是一个使用库 (autograd_minimize) 的简单解决方案，我根据 Roebel 的回答编写了该解决方案：

import tensorflow as tf
from autograd_minimize import minimize

def rosen_tf(x):
    return tf.reduce_sum(100.0*(x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)

res = minimize(rosen_tf, np.array([0.,0.]))
print(res.x)
>>> array([0.99999912, 0.99999824])

它也适用于 keras 模型，如这个简单的线性回归示例所示：

import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from autograd_minimize.tf_wrapper import tf_function_factory
from autograd_minimize import minimize 
import tensorflow as tf

#### Prepares data
X = np.random.random((200, 2))
y = X[:,:1]*2+X[:,1:]*0.4-1

#### Creates model
model = keras.Sequential([keras.Input(shape=2),
                          layers.Dense(1)])

# Transforms model into a function of its parameter
func, params = tf_function_factory(model, tf.keras.losses.MSE, X, y)

# Minimization
res = minimize(func, params, method='L-BFGS-B')

print(res.x)
>>> [array([[2.0000016 ],
 [0.40000062]]), array([-1.00000164])]

Answer 3

我猜SciPy不知道如何计算TensorFlow对象的梯度。尝试使用原始的函数工厂（即，该函数在丢失后还会返回梯度），然后在jac=True中设置scipy.optimize.minimize。

我测试了原始Gist的python代码，并用SciPy优化器替换了tfp.optimizer.lbfgs_minimize。它可以使用BFGS方法：

results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')

jac=True表示SciPy知道func也返回渐变。

对于L-BFGS-B，这很棘手。经过一些努力，我终于使它起作用。我必须注释掉@tf.function行，并让func返回grads.numpy()而不是原始TF张量。我猜这是因为L-BFGS-B的基础实现是Fortran函数，因此从tf.Tensor-> numpy array-> Fortran array转换数据可能会出现一些问题。并强制函数func返回渐变的ndarray版本可以解决此问题。但是那样就不可能使用@tf.function。

将Scipy Optimizer与Tensorflow 2.0一起用于神经网络训练

3 个答案:

基准

结果