Question

我有一些张量尺寸固定的变长序列数据。考虑一些固定维矩阵[l_1，m，n]，...，[l_b，m，n]的张量列表s_1，...，s_b。例如

s_1 = [ [[1,2],[3,4]], [[5,6],[7,8]] ]
s_2 = [ [[9,10],[11,12]], [[13,14],[15,16]], [[17,18],[19,20]] ]

尽管数据以填充形式提供，如下所示

S = [ [[[1,2],[3,4]], [[5,6],[7,8]], [[0,0],[0,0]]], 
    [[[9,10],[11,12]], [[13,14],[15,16]], [[17,18],[19,20]]] ]
l = [2,3]

其中，S是矩阵的填充列表的张量，l是具有第i项的一维张量，其序号为i的长度。

现在，我想提取由矩阵列表的级联给出的张量。结果应该是

[ [[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]], [[13,14],[15,16]], [[17,18],[19,20]] ]

此计算将在map的{{1}}方法的函数中完成。

在张量流中执行此操作的正确方法是什么？我曾想过将tf.data.Dataset与tf.boolean_mask一起使用，但不能完全得到我想要的结果。也许聪明地使用了tf.sequence_mask？

编辑

tf.gather_nd

似乎可以给出正确的结果，但是我只能让它在传递给tf.boolean_mask(S,tf.sequence_mask(l))的{{1}}方法的函数之外工作。

map

有效。但是，如果我尝试在映射函数内部进行修改，我只会再次得到tf.data.Dataset：

def _parse_SE(in_example_proto):
    S = ...
    l = ... #obtain S and l from the record

    return tf.tuple([S,l])
dataset = tf.data.TFRecordDataset("test.txt")
dataset = dataset.map(_parse_SE)
dataset = dataset.padded_batch(BATCH_SIZE, padded_shapes=([], [None], [], [None,n,s]))
iterator = dataset.make_initializable_iterator()
[S_bat, l_bat] = iterator.get_next()
wanted_bat = tf.boolean_mask(S_bat,tf.sequence_mask(l_bat)) # when evaluated wanted_bat stores the wanted concatenation

编辑2

第二种方法不起作用，因为传递给S的函数def _parse_SE(in_example_proto): S = ... l = ... #obtain S and l from the record wanted = tf.boolean_mask(S,tf.sequence_mask(l)) return tf.tuple([wanted,l]) dataset = tf.data.TFRecordDataset("test.txt") dataset = dataset.map(_parse_SE) dataset = dataset.padded_batch(BATCH_SIZE, padded_shapes=([], [None], [], [None,n,s])) iterator = dataset.make_initializable_iterator() [wanted_bat, l_bat] = iterator.get_next() # when evaluated wanted_bat just contains S_bat of the previous example将同时应用于批处理中的每个元素，而不会应用于整个批量。因此，我们无法重构_parse_SE内部的批处理。

Tensorflow中可变大小数据的填充张量的聚集子张量

0 个答案: