Question

我目前正在尝试实施this paper的修改后的表单（稍后讨论）。

特别是等式5,7和8可总结如下：

如果输入一批数据点 x ，我们会计算以下值

e _i = x _i Ar （其中A和r是适当尺寸的张量）

然后规范化批次 e _i 的值

最后计算整个批次的单一表示

这很好用。值得一提的是，这些批次不是随机创建的，而是代表一组特定的数据点，共同形成一个组表示。因此，在特定批次中，我们只能传递代表特定组的数据点。在这种情况下，批量大小是可变的，因为每个组可以具有不同数量的数据点。现在，对于我的用例，我们更进一步。

我们假设我们的伪批量大小为32。

在单个训练步骤中，我们传递K ₁ + K ₂ .... K ₃₂ = K个数据点，其中K < sub> 1 ，K ₂ .... K ₃₂表示子组的不同大小。因此，在评估上述等式之后，我们需要32个输出表示，而不是获得整个批次的1个输出表示，每组1个。做这种手术的最佳方式是什么？

可能（或可能是）必要的一件事是将组号传递给另一个占位符，如下所示

Data_Point 1 - ＆gt;分批1

Data_Point 2 - ＆gt;分批1

Data_Point 3 - ＆gt;分批1

Data_Point 4 - ＆gt;分批2

Data_Point 5 - ＆gt;分批2

...

Data_Point K - ＆gt;子批次32

但是，我无法想出更明确的技术。任何帮助将受到高度赞赏。

编辑1：我创建了一个上述问题的小玩具设置here

如您所见，目前生成的组表示形状为1 * 10。但它应该是5 * 10

编辑2：我得到了它的工作，但我担心我的代码的正确性/优化。如果有人可以帮助我检查代码是否正确，那将是很好的，如果有的话，我是否可以通过任何方式进一步优化我的代码。

现在，我使用while循环遍历每个子组，并使用掩码操作来提取值的子集。生成的组表示将分配给TensorArray对象中的特定索引。由于每个子组独立于另一个，我可以（可能）并行化while循环的所有迭代。

import random
import numpy as np
import tensorflow as tf

seed = 12
tf.set_random_seed(seed)
np.random.seed(seed)
random.seed(seed)

max_sequence_length = 10
pseudo_batch_size = 5

def get_data():
    data_list, group_num_list = [], []
    sub_group_sizes = list(np.random.randint(2, 6, pseudo_batch_size))
    for group_num, size in enumerate(sub_group_sizes):
        group_data = np.random.random_sample((size, max_sequence_length))
        data_list.extend(group_data)
        group_num_list.extend([group_num] * size)

    print("Number of Data Points %s" % (len(data_list)))
    print("Group Sizes %s" % (sub_group_sizes))
    data_x = np.array(data_list)
    print("Shape of Data %s" % (data_x.shape,))

    return (data_x, group_num_list)


def get_attention_weighted_rep(x):
    x_prime = tf.matmul(tf.matmul(x, W_att), W_r)
    attention = tf.nn.softmax(x_prime)
    group_representation = tf.reduce_sum(attention * x, axis=0, keep_dims=True)
    return group_representation


def loop_body(initial_loop_val, outputs_x):
    eq_op = tf.equal(input_group_num, initial_loop_val)
    mask_op_x = tf.boolean_mask(input_x, eq_op)
    group_representation = get_attention_weighted_rep(mask_op_x)
    outputs_x = outputs_x.write(initial_loop_val, group_representation)
    return initial_loop_val + 1, outputs_x


def condition(initial_loop_val, outputs_x):
    return tf.less(initial_loop_val, pseudo_batch_size)


train_x, group_num_list = get_data()

initial_loop_val = tf.constant(0)
input_x = tf.placeholder(tf.float32, shape=[None, max_sequence_length], name="input_x")
input_group_num = tf.placeholder(tf.int32, shape=[None], name="input_group_num")

W_r = tf.get_variable("w_r", [max_sequence_length, 1],
                      initializer=tf.random_uniform_initializer())

W_att = tf.diag(tf.truncated_normal([max_sequence_length], stddev=0.001))

outputs_x = tf.TensorArray(size=pseudo_batch_size, dtype=tf.float32)

# Since each sub group is independent of the others, we can execute all sub-groups in parallel
initial_loop_val, outputs_x = tf.while_loop(cond=condition, body=loop_body, loop_vars=(initial_loop_val, outputs_x),
                                            parallel_iterations=pseudo_batch_size)
group_representations = outputs_x.concat()

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    generated_group_representation = sess.run(group_representations, feed_dict={input_x: train_x,
                                                                                input_group_num: group_num_list})
    print("Shape of Generated Group Representation is %s" % (generated_group_representation.shape,))
    # print("Generated Group Representation is %s" % generated_group_representation)

编辑3：

Boolean_mask的用法导致以下警告

＆＃34;将稀疏的IndexedSlices转换为未知形状的密集张量。＆＃34;

对于我的特定用例，如果这实际上是一个内存问题，如警告消息所示，是否有我可以使用的掩码操作的替代方法？我不确定在某些SO帖子中建议的tf.dynamic_partition操作如何（有效地）在这里使用

Answer 1

如果您的代码假设您稍后使用正确的值填写w_att和w_r，那么它应该按照您的想法执行：生成TensorArray，其中每个元素都要计算跨越伪造的批次。你必须非常认真地仔细思考，以确保以正确的方式填充这些内容。由于您正在执行多个（可能非常大/稀疏）矩阵乘法，因此效率非常低。

如果您提前知道您将拥有多少个伪批次，那么您应该tf.dynamic_partition使用group_num_list

x_partitions = tf.dynamic_partition(x, group_num_list, pseudo_batch_size)

x_partitions是一个拓展的python列表，其中x_partitions[0]是来自x的向量的堆栈，group_num_list的索引为0。例如（实际上不是代码）

given    x.shape == [5, 100] and 
  group_num_list == [0,0,0,1,1],
then   len(x_partitions) == 2 and 
   x_partitions[0].shape == [3, 100]
   x_partitions[1].shape == [2, 100]

然后，您可以使用本机python循环在所有分区上的一个分区上执行任何其他操作（尽管这会在图中创建一堆额外节点，但计算开销非常小））。

我要修改你的get_attention_weighted_rep，主要是假设不同的名字，并提供所有组件。我打电话给它x_to_s

def x_to_s(x, A, r):
    e = tf.matmul(tf.matmul(x, A), r)
    alpha = tf.softmax(e)
    return tf.reduce_sum(x * e, 0, keep_dims=True)

每个分区完成此操作后，您可以将它们连接在一起，以便在所有伪批次中获得s：

# following on from our example earlier, let's use x_to_s 
# which takes an x with x.shape = [None, 100]
# and returns a tensor with shape = [1, 100]
s_parts = []
for p in x_partitions:
    # calculate/find A and r for this x
    s_parts.append(x_to_s(p, A, r))
s = tf.concat(s_parts)
# this makes s.shape = [2, 100]

如果在运行时需要动态数量的伪批次;那我觉得这不是一个简单的方法。

我没有阅读足够的论文知道如何计算A和r，抱歉。

在Tensorflow中处理迷你批次内的可变大小子批次

1 个答案: