Question

我正在使用Tensorflow的tf.data.Dataset API，并创建了一个像这样的小型数据集：

dataset = tf.data.Dataset.from_tensor_slices(({'reviews': x}, y)))

x是电影评论（字符串），y是它们的标签（字符串）。我正在尝试对每个这样的评论进行一些预处理：

def preprocess(x, y):
    # split on whitespace
    x['reviews'] = tf.string_split([x['reviews']])
    # turn into integers
    x['reviews'], y = data_table.lookup(x['reviews']), labels_table.lookup(y)
    x['reviews'] = tf.sparse_tensor_to_dense(x['reviews'])
    x['reviews'] = tf.slice(x['reviews'], [0, 0], [-1, 100])
    x['reviews'] = tf.pad(x['reviews'],
               paddings=[[100 - tf.shape(x['reviews'])[0], 0]],
               mode='CONSTANT',
               name='pad_input',
               constant_values=-1)
    y = tf.one_hot(y, depth=20)
    return x, y

然后我做

dataset = dataset.map(preprocess)

以上目标是：

1）接受我的字符串评论

2）在空白处分割

3）将评论转换为整数数组

4）截断长度为100的数组（因此，如果长度最初为125，则只能使用前100个元素）

5）最终用少于100个单词的评论填充-1，使它们的长度达到100。

当我将此dataset传递给我的keras.model.fit(...)通话时，我回来了：

tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected size[1] in [0, 91], but got 100
     [[Node: Slice = Slice[Index=DT_INT32, T=DT_INT64](SparseToDense, Slice/begin, Slice/size)]]
     [[Node: IteratorGetNext_1 = IteratorGetNext[output_shapes=[[?,100], [?,20]], output_types=[DT_INT64, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Iterator_1)]]

这使我认为输入评论的长度为91，并且以某种方式弄乱了切片？任何建议都很棒！

Tensorflow：tf.pad导致形状错误，不清楚原因

0 个答案: