现在我正在使用带有tensorflow后端的keras。 数据集以tfrecords格式存储。在没有任何验证集的情况下进行培训是可行的,但是如何也整合我的Validation-tfrecords?
让我们假设此代码为粗略骨架:
def _ds_parser(proto):
features = {
'X': tf.FixedLenFeature([], tf.string),
'Y': tf.FixedLenFeature([], tf.string)
}
parsed_features = tf.parse_single_example(proto, features)
# get the data back as float32
parsed_features['X'] = tf.decode_raw(parsed_features['I'], tf.float32)
parsed_features['Y'] = tf.decode_raw(parsed_features['Y'], tf.float32)
return parsed_features['X'], parsed_features['Y']
def datasetLoader(dataSetPath, batchSize):
dataset = tf.data.TFRecordDataset(dataSetPath)
# Maps the parser on every filepath in the array. You can set the number of parallel loaders here
dataset = dataset.map(_ds_parser, num_parallel_calls=8)
# This dataset will go on forever
dataset = dataset.repeat()
# Set the batchsize
dataset = dataset.batch(batchSize)
# Create an iterator
iterator = dataset.make_one_shot_iterator()
# Create your tf representation of the iterator
X, Y = iterator.get_next()
# Bring the date back in shape
X = tf.reshape(I, [-1, 66, 198, 3])
Y = tf.reshape(Y,[-1,1])
return X, Y
X, Y = datasetLoader('PATH-TO-DATASET', 264)
model_X = keras.layers.Input(tensor=X)
model_output = keras.layers.Conv2D(filters=16, kernel_size=3, strides=1, padding='valid', activation='relu',
input_shape=(-1, 66, 198, 3))(model_X)
model_output = keras.layers.Dense(units=1, activation='linear')(model_output)
model = keras.models.Model(inputs=model_X, outputs=model_output)
model.compile(
optimizer=optimizer,
loss='mean_squared_error',
target_tensors=[Y]
)
parallel_model.fit(
epochs=epochs,
steps_per_epoch=stepPerEpoch,
shuffle=False,
validation_data=????
)
问题是,如何通过验证集?
我在这里找到了一些相关的内容:gcloud-ml-engine-with-keras,但是我不确定如何将其适合我的问题。
答案 0 :(得分:1)
好吧,我自己找到了答案:基本上,只需将import keras
更改为import tensorflow.keras as keras
即可完成。 Tf.keras允许您将验证集也作为张量传递:
X, Y = datasetLoader('PATH-TO-DATASET', 264)
X_val, Y_val = datasetLoader('PATH-TO-VALIDATION-DATASET', 264)
# ... define and compile the model like above
parallel_model.fit(
epochs= epochs,
steps_per_epoch= STEPS_PER_EPOCH,
shuffle= False,
validation_data= (X_val, Y_val),
validation_steps= STEPS_PER_VALIDATION_EPOCH
)
答案 1 :(得分:0)
首先,您不需要使用迭代器。 Keras模型将接受数据集对象,而不是单独的数据/标签参数,并将处理迭代。您只需要指定steps_per_epoch
,因此您需要知道数据集的大小。如果您有用于训练/验证的单独的tfrecords文件,则只需创建数据集对象并将其传递给validation_data
。如果您有一个文件并且想要分割它,可以执行
dataset = tf.data.TFRecordDataset('file.tfrecords')
dataset_train = dataset.take(size)
dataset_val = dataset.skip(size)
...