Tensorflow-依次迭代训练和验证

时间:2019-01-10 20:26:20

标签: python-3.x tensorflow

我一直在使用tensorflow的Dataset API来轻松地将不同的数据集馈入RNN模型。

在没有那么多博客以及tensorflow网站上的文档的帮助下,我一切正常。我的工作示例执行以下操作:

---在训练数据集中训练X个纪元->在所有训练均在验证数据集中结束后进行验证。

但是,我无法开发以下示例:

---在训练数据集中的X个纪元上进行训练->在每个纪元中使用验证数据集对训练模型进行验证(有点像Keras所做的事情)

存在问题的原因是以下代码:

train_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE, drop_remainder=True).repeat()

val_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE_VAL, drop_remainder=True).repeat()

itr = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
train_init_op = itr.make_initializer(train_dataset)
validation_init_op = itr.make_initializer(val_dataset)

创建迭代器from_structure时,需要指定一个output_shape。显然,火车数据集和验证数据集的输出形状不同,因为它们的batch_size不同。但是,validation_init_op会引发以下错误,这似乎违反直觉,因为验证集始终具有不同的batch_size:

TypeError: Expected output shapes compatible with (TensorShape([Dimension(256), Dimension(11), Dimension(74)]), TensorShape([Dimension(256), Dimension(3)])) but got dataset with output shapes (TensorShape([Dimension(28), Dimension(11), Dimension(74)]), TensorShape([Dimension(28), Dimension(3)])).

我想做第二种方法来评估我的模型,并查看同时开发的通用训练图和验证图,以了解如何改进它(尽早停止学习等)。但是,使用第一种简单方法,我无法获得所有这些信息。

所以,问题是:我做错了吗?我的第二种方法是否必须以不同的方式处理?我可以考虑创建两个迭代器,但是我不知道这是否是正确的方法。另外,@ MatthewScarpino的回答指出了一个可喂入的迭代器,因为在可重新初始化的迭代器之间进行切换会使它们重新开始。但是,以上错误与代码的那部分无关- ??也许可重新初始化的迭代器无意为验证集设置不同的批处理大小,并且在训练了大小后仅对其进行了一次迭代并且没有在.batch()方法中进行设置?

非常感谢您的帮助。

完整代码供参考:

N_TIMESTEPS_X = xt.shape[0] ## The stack number
BATCH_SIZE = 256
#N_OBSERVATIONS = xt.shape[1]
N_FEATURES = xt.shape[2]
N_OUTPUTS = yt.shape[1]
N_NEURONS_LSTM = 128 ## Number of units in the LSTMCell 
N_EPOCHS = 350
LEARNING_RATE = 0.001

### Define the placeholders anda gather the data.
xt = xt.transpose([1,0,2])
xval = xval.transpose([1,0,2])

train_data = (xt, yt)
validation_data = (xval, yval)

N_BATCHES = train_data[0].shape[0] // BATCH_SIZE
print('The number of batches is: {}'.format(N_BATCHES))
BATCH_SIZE_VAL = validation_data[0].shape[0] // N_BATCHES
print('The validation batch size is: {}'.format(BATCH_SIZE_VAL))

## We define the placeholders as a trick so that we do not break into memory problems, associated with feeding the data directly.
'''As an alternative, you can define the Dataset in terms of tf.placeholder() tensors, and feed the NumPy arrays when you initialize an Iterator over the dataset.'''
batch_size = tf.placeholder(tf.int64)
x = tf.placeholder(tf.float32, shape=[None, N_TIMESTEPS_X, N_FEATURES], name='XPlaceholder')
y = tf.placeholder(tf.float32, shape=[None, N_OUTPUTS], name='YPlaceholder')

# Creating the two different dataset objects.
train_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE, drop_remainder=True).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE_VAL, drop_remainder=True).repeat()

# Creating the Iterator type that permits to switch between datasets.
itr = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
train_init_op = itr.make_initializer(train_dataset)
validation_init_op = itr.make_initializer(val_dataset)

next_features, next_labels = itr.get_next()

1 个答案:

答案 0 :(得分:0)

在研究了实现此目的的最佳方法之后,我发现了最终的实现方案,该方案对我而言效果很好。当然不是最好的。为了保持状态,我使用了可填充迭代器

AIM::当您要同时训练和验证并保留每个迭代器的状态(即使用最新的模型参数进行验证)时,可以使用此代码。除此之外,代码还保存了模型和其他内容,例如有关超参数和摘要的信息,以可视化Tensorboard中的训练和验证。

此外,请不要感到困惑:您不需要为训练集和验证集使用不同的批次大小。这是我的误解。批次大小必须相同,并且您必须处理不同数量的批次,只有在没有剩余批次时才通过。这是一项要求,以便您可以创建迭代器,以使两个数据集具有相同的数据类型和形状。

希望它对他人有帮助。只需忽略与您的目标无关的代码即可。 非常感谢@kvish的所有帮助和时间。

代码:

def RNNmodelTF(xt, yt, xval, yval, xtest, ytest):

N_TIMESTEPS_X = xt.shape[0] ## The stack number
BATCH_SIZE = 256
#N_OBSERVATIONS = xt.shape[1]
N_FEATURES = xt.shape[2]
N_OUTPUTS = yt.shape[1]
N_NEURONS_LSTM = 128 ## Number of units in the LSTMCell 
N_EPOCHS = 350
LEARNING_RATE = 0.001

### Define the placeholders anda gather the data.
xt = xt.transpose([1,0,2])
xval = xval.transpose([1,0,2])

train_data = (xt, yt)
validation_data = (xval, yval)

N_BATCHES = train_data[0].shape[0] // BATCH_SIZE

## We define the placeholders as a trick so that we do not break into memory problems, associated with feeding the data directly.
'''As an alternative, you can define the Dataset in terms of tf.placeholder() tensors, and feed the NumPy arrays when you initialize an Iterator over the dataset.'''
batch_size = tf.placeholder(tf.int64)
x = tf.placeholder(tf.float32, shape=[None, N_TIMESTEPS_X, N_FEATURES], name='XPlaceholder')
y = tf.placeholder(tf.float32, shape=[None, N_OUTPUTS], name='YPlaceholder')

# Creating the two different dataset objects.
train_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE, drop_remainder=True).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE, drop_remainder=True).repeat()

#################### Creating the Iterator type that permits to switch between datasets.

handle = tf.placeholder(tf.string, shape = [])
iterator = tf.data.Iterator.from_string_handle(handle, train_dataset.output_types, train_dataset.output_shapes)
next_features, next_labels = iterator.get_next()

train_val_iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
train_iterator = train_val_iterator.make_initializer(train_dataset)
val_iterator = train_val_iterator.make_initializer(val_dataset)

###########################

### Create the graph 
cellType = tf.nn.rnn_cell.LSTMCell(num_units=N_NEURONS_LSTM, name='LSTMCell')
inputs = tf.unstack(next_features, axis=1)
'''inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size]'''
RNNOutputs, _ = tf.nn.static_rnn(cell=cellType, inputs=inputs, dtype=tf.float32)
out_weights = tf.get_variable("out_weights", shape=[N_NEURONS_LSTM, N_OUTPUTS], dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer())
out_bias = tf.get_variable("out_bias", shape=[N_OUTPUTS], dtype=tf.float32, initializer=tf.zeros_initializer())
predictionsLayer = tf.matmul(RNNOutputs[-1], out_weights) + out_bias

### Define the cost function, that will be optimized by the optimizer. 
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=predictionsLayer, labels=next_labels, name='Softmax_plus_Cross_Entropy'))
optimizer_type = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE, name='AdamOptimizer')
optimizer = optimizer_type.minimize(cost)

### Model evaluation 
correctPrediction = tf.equal(tf.argmax(predictionsLayer,1), tf.argmax(next_labels,1))
accuracy = tf.reduce_mean(tf.cast(correctPrediction,tf.float32))

confusionMatrix1 = tf.confusion_matrix(tf.argmax(next_labels,1), tf.argmax(predictionsLayer,1), num_classes=3, name='ConfMatrix')

## Saving variables so that we can restore them afterwards.
saver = tf.train.Saver()
save_dir = '/media/SecondDiskHDD/8classModels/DLmodels/tfModels/{}_{}'.format(cellType.__class__.__name__, datetime.now().strftime("%Y%m%d%H%M%S"))
#save_dir = '/home/Desktop/tfModels/{}_{}'.format(cellType.__class__.__name__, datetime.now().strftime("%Y%m%d%H%M%S"))
os.mkdir(save_dir)
varDict = {'nTimeSteps': N_TIMESTEPS_X, 'BatchSize': BATCH_SIZE, 'nFeatures': N_FEATURES,
           'nNeuronsLSTM': N_NEURONS_LSTM, 'nEpochs': N_EPOCHS,
           'learningRate': LEARNING_RATE, 'optimizerType': optimizer_type.__class__.__name__}
varDicSavingTxt = save_dir + '/varDict.txt'
modelFilesDir = save_dir + '/modelFiles'
os.mkdir(modelFilesDir)

logDir = save_dir + '/TBoardLogs'
os.mkdir(logDir)

acc_summary = tf.summary.scalar('Accuracy', accuracy)
loss_summary = tf.summary.scalar('Cost_CrossEntropy', cost)
summary_merged = tf.summary.merge_all()

with open(varDicSavingTxt, 'w') as outfile:
    outfile.write(repr(varDict))

with tf.Session() as sess:

    tf.set_random_seed(2)
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter(logDir + '/train', sess.graph)
    validation_writer = tf.summary.FileWriter(logDir + '/validation')

    # initialise iterator with data
    train_val_string = sess.run(train_val_iterator.string_handle())

    cm1Total = None
    cm2Total = None

    print('¡Training starts!')
    for epoch in range(N_EPOCHS):

        batchAccList = []
        batchAccListVal = []
        tot_loss_train = 0
        tot_loss_validation = 0

        for batch in range(N_BATCHES):

            sess.run(train_iterator, feed_dict = {x : train_data[0], y: train_data[1], batch_size: BATCH_SIZE})
            optimizer_output, loss_value, summary, accBatch, cm1 = sess.run([optimizer, cost, summary_merged, accuracy, confusionMatrix1], feed_dict = {handle: train_val_string})

            npArrayPred = predictionsLayer.eval(feed_dict= {handle: train_val_string})
            predLabEnc = np.apply_along_axis(thresholdSet, 1, npArrayPred, value=0.5)

            npArrayLab = next_labels.eval(feed_dict= {handle: train_val_string})
            labLabEnc = np.argmax(npArrayLab, 1)

            cm2 = confusion_matrix(labLabEnc, predLabEnc)
            tot_loss_train += loss_value
            batchAccList.append(accBatch)

            try:
                sess.run(val_iterator, feed_dict = {x: validation_data[0], y: validation_data[1], batch_size: BATCH_SIZE})
                valLoss, valAcc, summary_val = sess.run([cost, accuracy, summary_merged], feed_dict = {handle: train_val_string})
                tot_loss_validation += valLoss
                batchAccListVal.append(valAcc)

            except tf.errors.OutOfRangeError:
                pass

            if cm1Total is None and cm2Total is None:

                cm1Total = cm1
                cm2Total = cm2
            else:

                cm1Total += cm1
                cm2Total += cm2

            if batch % 10 == 0:

                train_writer.add_summary(summary, batch)
                validation_writer.add_summary(summary_val, batch)

        epochAcc = tf.reduce_mean(batchAccList)
        sess.run(train_iterator, feed_dict = {x : train_data[0], y: train_data[1], batch_size: BATCH_SIZE})
        epochAcc_num = sess.run(epochAcc, feed_dict = {handle: train_val_string})

        epochAccVal = tf.reduce_mean(batchAccListVal)
        sess.run(val_iterator, feed_dict = {x: validation_data[0], y: validation_data[1], batch_size: BATCH_SIZE})
        epochAcc_num_Val = sess.run(epochAccVal, feed_dict = {handle: train_val_string})

        if epoch%10 == 0:

            print("Epoch: {}, Loss: {:.4f}, Accuracy: {:.3f}".format(epoch, tot_loss_train / N_BATCHES, epochAcc_num))
            print('Validation Loss: {:.4f}, Validation Accuracy: {:.3f}'.format(tot_loss_validation / N_BATCHES, epochAcc_num_Val))

    cmLogFile1 = save_dir + '/cm1File.txt'
    with open(cmLogFile1, 'w') as outfile:
        outfile.write(repr(cm1Total))

    cmLogFile2 = save_dir + '/cm2File.txt'
    with open(cmLogFile2, 'w') as outfile:
        outfile.write(repr(cm2Total))

    saver.save(sess, modelFilesDir + '/model.ckpt')