Question

仅针对上下文，我正在尝试使用Tensorflow实现梯度下降算法。

我有一个矩阵X

[ x1 x2 x3 x4 ]
[ x5 x6 x7 x8 ]

我乘以某个要素向量Y得到Z

      [ y1 ]
Z = X [ y2 ]  = [ z1 ]
      [ y3 ]    [ z2 ]
      [ y4 ]

然后我通过softmax函数放入Z，并记录日志。我将输出矩阵称为W。

所有这些都按如下方式实现（添加了一些样板文件以便它可以运行）

sess = tf.Session()
num_features = 4
num_actions = 2

policy_matrix = tf.get_variable("params", (num_actions, num_features))
state_ph = tf.placeholder("float", (num_features, 1))
action_linear = tf.matmul(params, state_ph)
action_probs = tf.nn.softmax(action_linear, axis=0)
action_problogs = tf.log(action_probs)

W（对应action_problogs）看起来像

[ w1 ]
[ w2 ]

我想找到w1相对于矩阵X的渐变 - 也就是说，我想计算

          [ d/dx1 w1 ]
d/dX w1 =      .
               .
          [ d/dx8 w1 ]

（最好看起来像一个矩阵，所以我可以把它添加到X，但我真的不关心那个）

我希望tf.gradients可以做到这一点。我像这样计算了“渐变”

problog_gradient = tf.gradients(action_problogs, policy_matrix)

然而，当我检查problog_gradient时，这就是我得到的

[<tf.Tensor 'foo_4/gradients/foo_4/MatMul_grad/MatMul:0' shape=(2, 4) dtype=float32>]

请注意，这与X具有完全相同的形状，但它确实不应该这样。我希望得到两个渐变的列表，每个渐变都有8个元素。我怀疑我会得到两个渐变，但每个都有四个元素。

我对张力流非常陌生，所以我会欣赏和解释正在发生的事情以及如何实现我想要的行为。

Answer 1

渐变需要标量函数，因此默认情况下，它会对条目求和。这是默认行为，因为所有梯度下降算法都需要这种类型的功能，并且随机梯度下降（或其变化）是Tensorflow内部的首选方法。您将找不到任何更高级的算法（如BFGS或其他东西），因为它们还没有实现（并且它们需要一个真正的雅可比算法，它还没有实现）。值得一提的是，这是我写的一个正常运作的雅可比实现：

def map(f, x, dtype=None, parallel_iterations=10):
    '''
    Apply f to each of the elements in x using the specified number of parallel iterations.

    Important points:
    1. By "elements in x", we mean that we will be applying f to x[0],...x[tf.shape(x)[0]-1].
    2. The output size of f(x[i]) can be arbitrary. However, if the dtype of that output
       is different than the dtype of x, then you need to specify that as an additional argument.
    '''
    if dtype is None:
        dtype = x.dtype

    n = tf.shape(x)[0]
    loop_vars = [
        tf.constant(0, n.dtype),
        tf.TensorArray(dtype, size=n),
    ]
    _, fx = tf.while_loop(
        lambda j, _: j < n,
        lambda j, result: (j + 1, result.write(j, f(x[j]))),
        loop_vars,
        parallel_iterations=parallel_iterations
    )
    return fx.stack()

def jacobian(fx, x, parallel_iterations=10):
    '''
    Given a tensor fx, which is a function of x, vectorize fx (via tf.reshape(fx, [-1])),
    and then compute the jacobian of each entry of fx with respect to x.
    Specifically, if x has shape (m,n,...,p), and fx has L entries (tf.size(fx)=L), then
    the output will be (L,m,n,...,p), where output[i] will be (m,n,...,p), with each entry denoting the
    gradient of output[i] wrt the corresponding element of x.
    '''
    return map(lambda fxi: tf.gradients(fxi, x)[0],
               tf.reshape(fx, [-1]),
               dtype=x.dtype,
               parallel_iterations=parallel_iterations)

虽然此实现有效，但在尝试嵌套时不起作用。例如，如果您尝试使用jacobian( jacobian( ... ))计算Hessian，则会出现一些奇怪的错误。这被跟踪为Issue 675。我仍然awaiting a response说明为什么会抛出错误。我相信在while循环实现或渐变实现中存在一个根深蒂固的错误，但我真的不知道。

无论如何，如果你只需要一个雅可比，请尝试上面的代码。

Answer 2

tf.gradients实际上对y进行求和并计算其渐变，这就是发生此问题的原因。

相对于矩阵的Tensorflow梯度

2 个答案: