Question

我正在根据TensorFlow处理此project。

我只想根据自己的数据集通过attention_ocr训练OCR模型，但我不知道如何以与FSNS数据集相同的格式存储我的图像和基础事实。

有没有人也参与这个项目或知道如何解决这个问题？

Answer 1

用于存储培训/测试的数据格式在FSNS文件https://arxiv.org/pdf/1702.03970.pdf中定义（表4）。

要使用tf.Example protos存储tfrecord文件，您可以使用tf.python_io.TFRecordWriter。有a nice tutorial，现有answer on the stackoverflow和short gist。

假设您有一个numpy ndarray img并排存有num_of_views张图片（参见paper中的图3）： enter image description here 以及变量text中的相应文本。您需要定义一些函数来将unicode字符串转换为填充到固定长度且未填充的字符ID列表。例如：

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   text='abc', 
   charset={'a':0, 'b':1, 'c':2},
   length=5,
   null_char_id=3)

结果应该是：

char_ids_padded = [0,1,2,3,3]
char_ids_unpadded = [0,1,2]

如果您使用gist中定义的功能_int64_feature和_bytes_feature，则可以使用以下代码段创建与FSNS兼容的tf.Example原型：

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
  feature={
    'image/format': _bytes_feature("PNG"),
    'image/encoded': _bytes_feature(img.tostring()),
    'image/class': _int64_feature(char_ids_padded),
    'image/unpadded_class': _int64_feature(char_ids_unpadded),
    'height': _int64_feature(img.shape[0]),
    'width': _int64_feature(img.shape[1]),
    'orig_width': _int64_feature(img.shape[1]/num_of_views),
    'image/text': _bytes_feature(text)
  }
))

Answer 2

您不应直接使用以下代码：

"'image/encoded': _bytes_feature(img.tostring()),"

在我的代码中，我写了这个：

_,jpegVector = cv2.imencode('.jpeg',img)
imgStr = jpegVector.tostring()
'image/encoded': _bytes_feature(imgStr)

如何以与FSNS数据集相同的格式创建数据集？

2 个答案: