Tensorflow中的Softmax Jacobian

时间:2017-01-25 00:36:46

标签: python tensorflow neural-network

假设我有一个简单的单层神经网络:

x = tf.placeholder(tf.float32, [batch_size, input_dim])
W = tf.Variable(tf.random_normal([input_dim, output_dim]))
a = tf.matmul(x, W)
y = tf.nn.softmax(a)

因此,变量y的维度为batch_size output_dim。我想针对批次中的每个样本计算y相对于a的雅可比行列式,其batch_size的维度为output_dim output_dimtf.gradients。现在,在数学上,雅各比(dy / da)_ {i,j} = -y_i y_j表示i!= j,否则,(dy / da)_ {i,i} = y_i(1-y_i)。

我想知道如何根据TensorFlow中的输入计算softmax的Jacobian?我知道tf.gradients将计算相对于张量的标量的梯度,因此我将TensorFlow中的循环组合与async void或者甚至只是尝试实现上面给出的分析形式一起工作。但我不知道如何在TensorFlow中使用其操作来执行此操作,并且会感谢所有代码执行此操作!

1 个答案:

答案 0 :(得分:4)

似乎tf.gradientsoutput_dim上应用了总和。解决方案:拆开然后重新堆叠。不确定这会如何影响效率...

import numpy as np
import tensorflow as tf

batch_size = 3
input_dim = 10
output_dim = 20

W_vals = np.random.rand(input_dim, output_dim).astype(np.float32)

graph = tf.Graph()
with graph.as_default():
    x = tf.placeholder(tf.float32, [batch_size, input_dim])
    # Use a constant for easier checking
    W = tf.constant(W_vals, dtype=tf.float32)
    a = tf.matmul(x, W)
    y = a
    # remove softmax for easier checking
    # y = tf.nn.softmax(a)

    grads = tf.stack([tf.gradients(yi, x)[0] for yi in tf.unstack(y, axis=1)],
                     axis=2)

with tf.Session(graph=graph) as sess:
    x_vals = np.random.rand(batch_size, input_dim).astype(np.float32)
    g_vals = sess.run(grads, feed_dict={x: x_vals})

# check gradients match
tol = 1e-10
for i in range(batch_size):
    if np.max(np.abs(g_vals[i] - W_vals)) >= tol:
        raise Exception('')
print('Gradients seem to match!')