Tensorflow Keras也使用tfrecords进行验证

时间:2019-02-25 16:07:17

标签: python tensorflow keras deep-learning

现在我正在使用带有tensorflow后端的keras。 数据集以tfrecords格式存储。在没有任何验证集的情况下进行培训是可行的,但是如何也整合我的Validation-tfrecords?

让我们假设此代码为粗略骨架:

def _ds_parser(proto):
    features = {
        'X': tf.FixedLenFeature([], tf.string),
        'Y': tf.FixedLenFeature([], tf.string)
    }

    parsed_features = tf.parse_single_example(proto, features)

    # get the data back as float32
    parsed_features['X'] = tf.decode_raw(parsed_features['I'], tf.float32)
    parsed_features['Y'] = tf.decode_raw(parsed_features['Y'], tf.float32)

    return parsed_features['X'],  parsed_features['Y']

def datasetLoader(dataSetPath, batchSize):
    dataset = tf.data.TFRecordDataset(dataSetPath)

    # Maps the parser on every filepath in the array. You can set the number of parallel loaders here
    dataset = dataset.map(_ds_parser, num_parallel_calls=8)

    # This dataset will go on forever
    dataset = dataset.repeat()

    # Set the batchsize
    dataset = dataset.batch(batchSize)

    # Create an iterator
    iterator = dataset.make_one_shot_iterator()

    # Create your tf representation of the iterator
    X, Y = iterator.get_next()  

    # Bring the date back in shape
    X = tf.reshape(I, [-1, 66, 198, 3])
    Y = tf.reshape(Y,[-1,1])    

    return X, Y

X, Y = datasetLoader('PATH-TO-DATASET', 264)

model_X = keras.layers.Input(tensor=X)

model_output = keras.layers.Conv2D(filters=16, kernel_size=3, strides=1, padding='valid', activation='relu',
                                           input_shape=(-1, 66, 198, 3))(model_X)
model_output = keras.layers.Dense(units=1, activation='linear')(model_output)

model = keras.models.Model(inputs=model_X, outputs=model_output)

model.compile(
    optimizer=optimizer,
    loss='mean_squared_error',
    target_tensors=[Y]
)

parallel_model.fit(
    epochs=epochs,
    steps_per_epoch=stepPerEpoch,
    shuffle=False,
    validation_data=????
) 

问题是,如何通过验证集?

我在这里找到了一些相关的内容:gcloud-ml-engine-with-keras,但是我不确定如何将其适合我的问题。

2 个答案:

答案 0 :(得分:1)

好吧,我自己找到了答案:基本上,只需将import keras更改为import tensorflow.keras as keras即可完成。 Tf.keras允许您将验证集也作为张量传递:

X, Y = datasetLoader('PATH-TO-DATASET', 264)
X_val, Y_val = datasetLoader('PATH-TO-VALIDATION-DATASET', 264)

# ... define and compile the model like above

parallel_model.fit(
    epochs= epochs,
    steps_per_epoch= STEPS_PER_EPOCH,
    shuffle= False,
    validation_data= (X_val, Y_val),
    validation_steps= STEPS_PER_VALIDATION_EPOCH
)  

答案 1 :(得分:0)

首先,您不需要使用迭代器。 Keras模型将接受数据集对象,而不是单独的数据/标签参数,并将处理迭代。您只需要指定steps_per_epoch,因此您需要知道数据集的大小。如果您有用于训练/验证的单独的tfrecords文件,则只需创建数据集对象并将其传递给validation_data。如果您有一个文件并且想要分割它,可以执行

dataset = tf.data.TFRecordDataset('file.tfrecords')
dataset_train = dataset.take(size)
dataset_val = dataset.skip(size)

...