Tensorflow saver:内核似乎已经死亡

时间:2016-12-29 20:33:14

标签: python tensorflow ubuntu-14.04 jupyter-notebook

我在保存/恢复张量流模型时遇到了很多麻烦,要么我的"内核似乎已经死了"或者我得到错误("变量......已经退出")。

当我的内核死亡时,我在控制台中收到此错误日志:

[I 21:13:41.505 NotebookApp] Saving file at /Nanodegree_MachineLearning/06_Capstone/capstone.ipynb
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
[I 21:17:05.416 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel 81679b46-ec9b-4ce6-b5be-ae2d9cf01210 restarted
[I 21:17:41.778 NotebookApp] Saving file at /Nanodegree_MachineLearning/06_Capstone/capstone.ipynb
[19324:20881:1229/212110:ERROR:object_proxy.cc(583)] Failed to call method: org.freedesktop.UPower.GetDisplayDevice: object_path= /org/freedesktop/UPower: org.freedesktop.DBus.Error.UnknownMethod: Method "GetDisplayDevice" with signature "" on interface "org.freedesktop.UPower" doesn't exist

我的代码如下:

if __name__ == '__main__':
    if LEARN_MODUS:
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            steps_per_epoch = len(X_train) // BATCH_SIZE
            num_examples = steps_per_epoch * BATCH_SIZE

            # Train model
            for i in range(EPOCHS):
                for step in range(steps_per_epoch):
                    #Calculate next Batch
                    batch_start = step * BATCH_SIZE
                    batch_end = (step + 1) * BATCH_SIZE
                    batch_x = X_train[batch_start:batch_end] 
                    batch_y = y_train[batch_start:batch_end]

                    #Run Training
                    loss = sess.run(train_op, feed_dict={x: batch_x, y: batch_y, keep_prob: 0.5})

            try:
                saver
            except NameError:
                saver = tf.train.Saver()
            saver.save(sess, 'foo')
            print("Model saved")

要恢复模型,我使用:

predicions = tf.argmax(fc2,1)
predicted_classes = []

try:
    saver
except NameError:
    saver = tf.train.Saver()

with tf.Session() as sess:   
    saver = tf.train.import_meta_graph('foo.meta')
    saver.restore(sess, tf.train.latest_checkpoint('./'))

    predicted_classes = sess.run(predicions, feed_dict={x: X_test, keep_prob: 1.0})

我尝试了很多不同的方法,有时它可以工作(但并不总是!?),有时会崩溃,有时我会收到Variable错误。我是否必须以其他方式使用保存/恢复?

我正在使用: Ubuntu 14.04 蟒蛇3 Python 3.5.2 Tensorflow 0.12

里面的jupyter笔记本

谢谢!

1 个答案:

答案 0 :(得分:3)

当内存不足时可能会发生这种情况,解决方案是尝试较小的批量大小。我看到你将测试集提供给一个run调用,这需要足够的内存来同时执行所有示例。您可以执行eval_in_batches之类的操作来聚合几个较小的运行调用的准确性