创建TFRecord文件会在读取时导致UnicodeDecodeError

时间:2018-03-30 01:58:37

标签: tensorflow unicode object-detection tfrecord

我正在按照指示在此示例中创建TFRecord文件以进行对象检测:

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md

我使用Python 3.6.4创建了一个Jupyter Notebook,并使用这些指令创建了TensorFlow 1.6.0。

我更改了create_tf_example中的值赋值,以便从我的示例中输入正确的信息(这是一个PIL图像):

def create_tf_example(example):
    height = example.height
    width = example.width
    filename = tf.compat.as_bytes(example.filename)

    # convert Image to bytes for TF
    imgByteArr = io.BytesIO()
    example.save(imgByteArr, format='PNG')
    imgByteArr = imgByteArr.getvalue()
    encoded_image_data = tf.compat.as_bytes(imgByteArr, encoding='utf-8') # Encoded image bytes

    image_format = b'png'

    xmins = [0]
    xmaxs = [width]
    ymins = [0]
    ymaxs = [height]

    classes_text = [b'Test']
    classes = [1] 

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_image_data),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))

    return tf_example

但是,它会创建没有问题的文件,当我再次尝试阅读时,我收到一个错误。当我稍后尝试使用TensorFlow(label_map_util.load_labelmap(PATH_TO_LABELS))读取TFRecord文件时,这是同样的错误:

open('data/tfrecord/label_map.pbtxt').read()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-26-dfa57505da97> in <module>()
----> 1 open('data/tfrecord/label_map.pbtxt').read()

~/Documents/.../bin/../lib/python3.6/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 10: invalid start byte

我不清楚我应该做些什么,因为tf_example.SerializeToString()似乎在对示例中的字符串进行编码。

如果它来自open('data/tfrecord/label_map.pbtxt', 'rb').read()

,则输出如下
b'kX\x00\x00\x00\x00\x00\x00\x05@\xbe\xe0\n\xe7\xb0\x01\n\x15\n\x0bimage/width\x12\x06\x1a\x04\n\x02\x98\x03\n\x17\n\x0cimage/format\x12\x07\n\x05\n\x03png\n!\n\x18image/object/class/label\x12\x05\x1a\x03\n\x01\x01\n\x16\n\x0cimage/height\x12\x06\x1a\x04\n\x02\x98\x03\nM\n\x17image/object/class/text\x122\n0\n.Qma8oN1eQwAiKUQZJRXry1VD2yCwYWnZQ6rtQwsC8LzjDu\nR\n\x0fimage/source_id\x12?\n=\n;data/png/Qma8oN1eQwAiKUQZJRXry1VD2yCwYWnZQ6rtQwsC8LzjDu.png\n"\n\x16image/object/bbox/ymin\x12\x08\x12\x06\n\x04\x00\x00\x00\x00\n\xf2\xac\x01\n\rimage/encoded\x12\xdf\xac\x01\n\xdb\xac\x01\n\xd7\xac\x01\x89PNG\r\n

谢谢!

0 个答案:

没有答案