来自Dataset的tf.train.MonitoredTrainingSession和reinitializable迭代器

时间:2017-08-29 18:20:10

标签: tensorflow tensorflow-datasets

似乎MonitoredTrainingSession在第一次调用.run(..)之前做了一些操作(记录?),这意味着当我这样做时:

train_data = reader.traindata() # returns a tf.contrib.data.Dataset
it = tf.contrib.data.Iterator.from_structure(train_data.output_types, train_data.output_shapes)
init_train = it.make_initializer(train_data)
ne = it.get_next()
ts = tf.train.MonitoredTrainingSession(checkpoint_dir=save_path)

... no calls to ts.run ...

ts.run(init_train)

这会产生错误:

FailedPreconditionError (see above for traceback): GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element

所以它接缝好像MonitoredTrainingSession在运行我提供的操作之前正在做一些操作,这使得无法使用来自Dataset的可重新初始化的迭代器进行togeather。

我确信我错过了一些东西,并希望听到: - )

1 个答案:

答案 0 :(得分:7)

在Tensorflow中看起来还没有直接的解决方案。是的,他们没有完全支持数据集API,这很奇怪。

原因是,从检查点加载时,受监视的会话会跳过运行init_op。因此,Iterator初始值设定项应该是局部变量。

此问题中提供了当前的解决方案建议 - https://github.com/tensorflow/tensorflow/issues/12859

scaffold = tf.train.Scaffold(local_init_op=tf.group(tf.local_variables_initializer(),
                                     init_train))
with tf.train.MonitoredTrainingSession(scaffold=scaffold, 
                                       checkpoint_dir=checkpoint_dir) as sess:
    while not sess.should_stop():
        sess.run(train_op)