Question

我在Tensorflow中使用tf.data.Dataset API。我已经制作了一个Dataset对象，如下所示：

dataset = tf.data.Dataset.from_tensor_slices((data, labels))
val_dataset = val_dataset.map(lambda x, y: ({'review': x}, y))

所以现在我的dataset由一个元组组成，其中第一个元素是字典，第二个元素是字符串数组。

我正在尝试使用此功能进行基本的字符串预处理：

def preprocess(x, y):
    # split on whitespace
    logger.info(type(x))
    logger.info(type(y))
    x['sequence'] = tf.string_split([x['review']])
    logger.info(x['review'])

最后一条记录语句告诉我x['review']是：

SparseTensor(indices=Tensor("StringSplit:0", shape=(?, 2), dtype=int64), values=Tensor("StringSplit:1", shape=(?,), dtype=string), dense_shape=Tensor("StringSplit:2", shape=(2,), dtype=int64))

为什么indices具有形状(?,2)？ string_split是否应该仅在空白处分割并让生成的Tensor具有任何形状的结果（或至少是所需的最大长度）？

谢谢！

Tensorflow：string_split的奇怪行为

0 个答案: