Question

我在使用＆＃34;很多＆＃34;从TFRecord文件中读取时遇到了问题。（超过500个）事件。如果我创建一个包含500个事件的文件，一切都很好，但是当我尝试读取和解析文件时，超过500个会导致错误：

W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: Could not parse example input, value:
...
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 40: invalid start byte

图像是浮动形状(N, 2, 127, 50)（在阅读过程中重新塑造为(N, 127, 50, 2)）。我尝试用两种不同的方式编写它们：作为字节列表和浮点数列表，两者都以相同的方式失败。

对于＆＃34;字节方法＆＃34;，代码的业务部分是：

def write_to_tfrecord(data_dict, tfrecord_file):
    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    features_dict = {}
    for k in data_dict.keys():
        features_dict[k] = tf.train.Feature(
            bytes_list=tf.train.BytesList(value=[data_dict[k]['byte_data']])
        )
    example = tf.train.Example(
        features=tf.train.Features(feature=features_dict)
    )
    writer.write(example.SerializeToString())
    writer.close()

然后阅读：

def tfrecord_to_graph_ops_xtxutuvtv(filenames):
    def process_hitimes(inp, shape):
        hitimes = tf.decode_raw(inp, tf.float32)
        hitimes = tf.reshape(hitimes, shape)
        hitimes = tf.transpose(hitimes, [0, 2, 3, 1])
        return hitimes

    file_queue = tf.train.string_input_producer(filenames, name='file_queue')
    reader = tf.TFRecordReader()
    _, tfrecord = reader.read(file_queue)

    tfrecord_features = tf.parse_single_example(
        tfrecord,
        features={
            'hitimes-x': tf.FixedLenFeature([], tf.string),
        },
        name='data'
    )
    hitimesx = proces_hitimes(
        tfrecord_features['hitimes-x'], [-1, 2, 127, 50]
    )
    return hitimesx

（通常情况下，我也会阅读和编写其他张量，但只有一个是问题所在。）

对于＆＃34; float方法＆＃34;，代码如下：

def write_to_tfrecord(data_dict, tfrecord_file):
    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    features_dict = {}
    features_dict['hitimes-x'] = tf.train.Feature(
        float_list=tf.train.FloatList(
            value=data_dict['hitimes-x']['data'].flatten()
        )
    )
    example = tf.train.Example(
        features=tf.train.Features(feature=features_dict)
    )
    writer.write(example.SerializeToString())
    writer.close()

并且，在阅读时：

def tfrecord_to_graph_ops_xtxutuvtv(filenames):
    def process_hitimes(inp, shape):
        hitimes = tf.sparse_tensor_to_dense(inp)
        hitimes = tf.reshape(hitimes, shape)
        hitimes = tf.transpose(hitimes, [0, 2, 3, 1])
        return hitimes

    file_queue = tf.train.string_input_producer(filenames, name='file_queue')
    reader = tf.TFRecordReader()
    _, tfrecord = reader.read(file_queue)

    tfrecord_features = tf.parse_single_example(
        tfrecord,
        features={
            'hitimes-x': tf.VarLenFeature(tf.float32),
        },
        name='data'
    )
    hitimesx = process_hitimes(
        tfrecord_features['hitimes-x'], [-1, 2, 127, 50]
    )
    return hitimesx

正在写入的数据是float32类型的NumPy ndarray。

我很想认为这是一个错误（我使用的是TensorFlow 1.0），因为这两种方法最多可以处理~500张图像，但是当我尝试使用更多图像时会中断。我查看了文档，看看是否有我应该添加的参数，以便读者和作者可以处理更大的文件，但我没有找到任何东西（此外，500张图片不是很多 - 我需要写下数以百万计的10个人。）

有什么想法吗？我打算今天尝试使用TensorFlow 1.2，但还没有机会。

Answer 1

我升级到TF 1.2.1并且上述问题消失了（至少在使用ByteList时 - 我不确定哪种方法更像是惯用的TensorFlow，而是将所有内容都视为ByteList和字节数据对我来说是更简单的代码）。

我认为在读取大文件时会出现新的问题（现在，我可以在TF记录文件中写入超过25k个事件，可能更多） - 即TF立即打开整个文件并将其全部加载到内存中，这不仅仅是我的数据处理测试机器可以处理的，但我不直接将此归咎于TensorFlow（尽管我需要拿出来）某种方便的压缩或分块方案等。）。

TensorFlow TFRecord在阅读期间发现许多图像崩溃

1 个答案: