Question

import tensorflow as tf

tf.enable_eager_execution()

emb = tf.ones([100,16])
start_pos = tf.constant([1,2])
end_pos = tf.constant([11,31])

通过提供一个大矩阵emb，并开始位置start_pos和结束位置end_pos。如何得到不同范围的emb的reduce_sim（例如结果应该是形状（2,16），其中第一行是emb的第1行到第11行的总和，第二行是第2行到第31行的总和（一排）？

注意我尝试使用GPU（tf.py_func有效，但它在CPU上）

更新：我有一个解决方案，但它不是基于Matrix。我使用tf.while_loop循环遍历start_pos / end_pos中的每个pos来计算。

Answer 1

编辑：

实际上，以矢量化的方式做得更好并不是那么难。它需要更多的内存，但它应该更快：

import tensorflow as tf

tf.enable_eager_execution()

emb = tf.ones([100,16])
start_pos = tf.constant([1,2])
end_pos = tf.constant([11,31])
# Select indices of each slice
r = tf.range(tf.shape(emb)[0])[tf.newaxis]
m = (r >= start_pos[:, tf.newaxis]) & (r <= end_pos[:, tf.newaxis])
# Broadcast and tile original matrix
s = tf.cast(m, emb.dtype)[..., tf.newaxis] * emb[tf.newaxis]
# Compute sums
result = tf.reduce_sum(s, axis=1)

不幸的是，我认为没有办法提取多个切片以在一次操作中求和。如果切片的数量是固定的，那么在常规Python循环中可能会这样做

import tensorflow as tf

tf.enable_eager_execution()

emb = tf.ones([100,16])
start_pos = tf.constant([1,2])
end_pos = tf.constant([11,31])
batch_size = 2

result = []
for i in range(batch_size):
    result.append(tf.reduce_sum(emb[start_pos[i]:start_pos[i] + 1], axis=0))
result = tf.stack(result, axis=0)

如果仅在图形执行时知道切片数量，或者如果切片数量太大且您不希望图表中包含那么多节点，则可以使用tf.while_loop：

import tensorflow as tf

tf.enable_eager_execution()

emb = tf.ones([100,16])
start_pos = tf.constant([1,2])
end_pos = tf.constant([11,31])
batch_size = 2

result = tf.TensorArray(emb.dtype, batch_size)
_, result = tf.while_loop(lambda i, _: i < batch_size,
                          lambda i, result: (i + 1, result.write(i, tf.reduce_sum(emb[start_pos[i]:start_pos[i] + 1], axis=0))),
                          [0, result])
result = result.stack()
result = tf.stack(result, axis=0)

如何在大矩阵的不同范围内获得批量reduce_sum？

1 个答案: