Question

生成TFRecordDataset时如何跳过TFRecord文件中的条目？

给定一个TFRecord文件和tf.contrib.data.TFRecordDataset对象，我通过map创建了一个新的数据集protobuf定义。例如，

features = {'some_data': tf.FixedLenFeature([], tf.string)}

def parser(example_proto):
    e = tf.parse_single_example(example_proto, features)
    data = e['some_data']
    # ...do a bunch of stuff to data...
    return data

x = TFRecordDataset(filename)
x = x.map(parser)
x = x.cache(cache_filename)
x = x.repeat()
x = x.batch(batch_size)

这让我可以读入数据并进行一些预处理，然后缓存结果并为我的模型批量处理。

我的问题是，如果我想跳过其中一个TFRecord条目（例如，如果数据无效/坏），该怎么办？例如，在parser()中，也许我可以返回None或某种tf.cond来表示无效条目，或者查询某些断言。

Answer 1

（总结评论作为答案）

Dataset的{{3}}方法可以根据谓词过滤条目。

跳过TFRecordDataset.map（）中的数据集条目

1 个答案: