Question

我有一个ndtype int32的numpy数组，由ndarray.tostring（）编码，然后作为_bytes_feature编码到单独的TFRecord文件中，例如：

_bytes_feature = lambda string: tf.train.Feature(bytes_list=tf.train.BytesList(value=[string]))
labels = np.array([1, 2, 3, 4])
labels_raw = labels.tostring()
assert isinstance(labels_raw, bytes)
features = {"labels": _bytes_feature(labels_raw)}
example = tf.train.Example(features=tf.train.Features(feature=features))
with tf.python_io.TFRecordWriter(record_path + sound_name + "_{}.tfrecords".format(i)) as writer:
    writer.write(example.SerializeToString())

我使用

从文件中读取它们

def parser(serialized_example):
    features_description = {'labels': tf.FixedLenFeature([], tf.string)}
    features = tf.parse_single_example(serialized_example, features_description)
    labels = tf.decode_raw(features['labels'], tf.int32)
    return labels

dataset = tf.data.TFRecordDataset(list_of_tfrecords_files)
dataset = dataset.map(parser)

此过程可无缝处理大约3万个不同的标签阵列，但是可以处理其中约500个：

InvalidArgumentError: Key: labels.  Data types don't match. Data type: int64 but expected type: string
 [[{{node ParseSingleExample/ParseSingleExample}} = ParseSingleExample[Tdense=[DT_INT64, DT_STRING, DT_INT64, DT_STRING], dense_keys=["image/height", "image/mfcc", "image/width", "labels"], dense_shapes=[[], [], [], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const_1, ParseSingleExample/Const, ParseSingleExample/Const_1)]]
 [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[?,?,?,1], [?,80]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]

我已经检查了发生在哪些标签数组上，与该过程所使用的标签数组似乎没有明显区别。

>>> tf.__version__
'1.12.0'

Python 3.6.8

Answer 1

这实际上是一个人为错误。在我了解创建tfrecord文件的过程中，我一直在不断重写它们，或者我想。我以前将标签写为int64列表，并且由于某些原因这些文件没有被覆盖。因此，为什么解析器调用它们时它们仍然是int64。在完全删除所有记录并重新运行所述文件的创建和解析之后，该过程运行顺利。

TFRecords编码为带有字节的BytesList，但仅有时会从记录中解码为int64并抛出InvalidArgumentError

1 个答案: