Question

我想在 Tensorflow 中创建一个函数，对于给定数据X的每一行，仅对某些采样类应用 softmax 函数，让我们说2，在K个总类中，返回一个矩阵S，其中S.shape = (N,K)（N：给定数据的行数和K个总类别）。

矩阵S最终将包含零，并且在采样类为每一行定义的索引中包含非零值。

在简单的python中，我使用高级索引，但在Tensorflow中，我无法弄清楚如何制作它。我最初的问题是this, where I present the numpy code。

所以我试图在 Tensorflow 中找到解决方案，主要思想不是将S用作二维矩阵而是用作一维数组。代码看起来像这样：

num_samps = 2
S = tf.Variable(tf.zeros(shape=(N*K)))
W = tf.Variable(tf.random_uniform((K,D)))
tfx = tf.placeholder(tf.float32,shape=(None,D))
sampled_ind = tf.random_uniform(dtype=tf.int32, minval=0, maxval=K-1, shape=[num_samps])
ar_to_sof = tf.matmul(tfx,tf.gather(W,sampled_ind),transpose_b=True)
updates = tf.reshape(tf.nn.softmax(ar_to_sof),shape=(num_samps,))
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for line in range(N):
    inds_new = sampled_ind + line*K
    sess.run(tf.scatter_update(S,inds_new,updates), feed_dict={tfx: X[line:line+1]})

S = tf.reshape(S,shape=(N,K))

这是有效的，结果是预期的。但它正在运行非常慢。为什么会这样？我怎样才能更快地完成这项工作？

Answer 1

在张量流编程时，学习定义操作和执行它们之间的区别至关重要。当您在python 中运行时，大多数以tf.开头的函数将操作添加到计算图。

例如，当你这样做时：

tf.scatter_update(S,inds_new,updates)

以及：

inds_new = sampled_ind + line*K

多次，你的计算图增长超出了必要的范围，填补了所有内存并大大减慢了速度。

你应该做的是在循环之前定义计算一次：

init = tf.initialize_all_variables()
inds_new = sampled_ind + line*K
update_op = tf.scatter_update(S, inds_new, updates)
sess = tf.Session()
sess.run(init)
for line in range(N):
    sess.run(update_op, feed_dict={tfx: X[line:line+1]})

这样，您的计算图只包含inds_new和update_op的一个副本。请注意，当您执行update_op时，inds_new也将被隐式执行，因为它是计算图中的父项。

您还应该知道update_op每次运行时可能会有不同的结果，而且很好并且预期会很好。

顺便说一句，调试此类问题的一种好方法是使用张量板可视化计算图。在代码中添加：

summary_writer = tf.train.SummaryWriter('some_logdir', sess.graph_def)

然后在控制台中运行：

tensorboard --logdir=some_logdir

在服务的html页面上会有一张计算图的图片，你可以在那里检查你的张量。

Answer 2

请记住，tf.scatter_update将返回Tensor S，这意味着会话运行中的大内存副本，甚至是分布式环境中的网络副本。解决方案是基于@ sygi的答案：

update_op = tf.scatter_update(S, inds_new, updates)
update_op_op = update_op.op

然后在会话运行中，你这样做

sess.run(update_op_op)

这样可以避免复制大型Tensor S.

Tensorflow在python for循环中太慢了

2 个答案: