Question

我是TensorFlow的初学者。我的TensorFlow脚本突然退出Killed。我的代码如下：

import tensorflow as tf
# Load data X_train, y_train and X_valid, y_valid

# An image augmentation pipeline
def augment(x):
    x = tf.image.random_brightness(x, max_delta=0.2)
    x = tf.image.random_contrast(x, 0.5, 2)
    return x

from sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train)

def LeNet(x):
    # Define LeNet architecture
    return logits

# Features:
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
# Labels:
y = tf.placeholder(tf.int32, (None))
# Dropout probability
prob = tf.placeholder(tf.float32, (None))
# Learning rate
rate = tf.placeholder(tf.float32, (None))
rate_summary = tf.summary.scalar('learning rate', rate)

logits = LeNet(x)
accuracy_operation = # defined accuracy_operation

accuracy_summary = tf.summary.scalar('validation accuracy', accuracy_operation)
saver = tf.train.Saver()

summary = tf.summary.merge_all()
writer = tf.summary.FileWriter('./summary', tf.get_default_graph())

def evaluate(X_data, y_data):
    # Return accuracy with X_data, y_data
    return accuracy

with tf.Session() as sess:

    saver.restore(sess, './lenet')

    for i in range(EPOCHS):
        X_train, y_train = shuffle(X_train, y_train)
        for offset in range(0, len(X_train), BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train[offset:end], y_train[offset:end]
            batch_x = sess.run(augment(batch_x))

            # Run the training operation, update learning rate

        validation_accuracy = evaluate(X_valid, y_valid)
        writer.add_summary(sess.run(summary, feed_dict = {x: X_valid, y: y_valid, prob: 1., rate: alpha}))

我省略了我确定不会导致问题的部件。我知道哪些部分很好，因为剧本之前没有给出任何麻烦。添加某些部分（主要是摘要编写器操作）后，脚本突然显示Killed并在执行一定数量的训练操作后退出。我怀疑这是由于内存泄漏，但我无法检测到它。

Answer 1

几天前我遇到了类似的问题。在我的情况下，我有一些操作在计算上非常沉重，我后来才知道。一旦我缩小了张量的大小，消息就消失了，我的代码也运行了。我无法确切地告诉你案件中问题的原因是什么，但是根据我的经验和你说的话（只有在添加摘要时出现错误）我建议你调整你的X_valid的大小， Y_valid。可能只是作者无法应对太多的数据......

TensorFlow Python脚本被杀死

1 个答案: