tensorflow:从TFRecord

时间:2018-05-28 02:09:52

标签: tensorflow tensorflow-datasets tfrecord

我使用SequenceExample protobuf将时间序列数据读/写到TFRecord文件中。

我将一对np数组序列化如下:

writer = tf.python_io.TFRecordWriter(file_name)

context = tf.train.Features( ... Feature( ... ) ... )

feature_data = tf.train.FeatureList(feature=[
                  tf.train.Feature(float_list=tf.train.FloatList(value=
                                   np.random.normal(size=([4065000,]))])
labels = tf.train.FeatureList(feature=[
                  tf.train.Feature(int64_list=tf.train.Int64List(value=
                           np.random.random_integers(0,10,size=([1084,]))])

##feature_data and labels are of similar, but varying lengths

feature_list = {"feature_data": feature_data,
                "labels": labels}

feature_lists = tf.train.FeatureLists(feature_list=feature_list)
example = tf.train.SequenceExample(context=context,
                                   feature_lists=feature_lists)

        ## serialize and close

当试图读取.tfrecords文件时,我发现了很多错误,主要是因为SequenceExample protobuf将时间序列数据写为一系列值(例如:值:-12.2549,值:-18.1372, ....价值:13.1234)。我读取.tfrecords文件的代码如下:

dataset = tf.data.TFRecordDataset("data/tf_record.tfrecords")
dataset = dataset.map(decode)
dataset = dataset.make_one_shot_iterator().get_next()

### reshape tensors and feed to estimator###

我的decode()函数定义如下:

def decode(serialized_proto):
    context_features = {...}
    sequence_features = {"feature_data": tf.FixedLenSequenceFeature((None,), 
                                                                tf.float32),
                         "labels": tf.FixedLenSequenceFeature(((None,), 
                                                                 tf.int64)}

    context, sequence = tf.parse_single_sequence_example(serialized_proto,
                                        context_features=context_features,
                                        sequence_features=sequence_features)

    return context, sequence

其中一个错误如下:

Shape [?] is not fully defined for 'ParseSingleSequenceExample/ParseSingleSequenceExample' (op: 'ParseSingleSequenceExample') with input shapes: [], [0], [], [], [], [], [], [], [].

我的主要问题是如何考虑数据集的结构。我不确定我是否真的理解返回数据的结构。我很难迭代这个数据集并返回可变大小的张量。提前谢谢!

1 个答案:

答案 0 :(得分:0)

您只能在知道特征形状的情况下使用REPLACE。否则,请改用tf.FixedLenSequenceFeature