我是TensorFlow的初学者。我的TensorFlow脚本突然退出Killed
。我的代码如下:
import tensorflow as tf
# Load data X_train, y_train and X_valid, y_valid
# An image augmentation pipeline
def augment(x):
x = tf.image.random_brightness(x, max_delta=0.2)
x = tf.image.random_contrast(x, 0.5, 2)
return x
from sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train)
def LeNet(x):
# Define LeNet architecture
return logits
# Features:
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
# Labels:
y = tf.placeholder(tf.int32, (None))
# Dropout probability
prob = tf.placeholder(tf.float32, (None))
# Learning rate
rate = tf.placeholder(tf.float32, (None))
rate_summary = tf.summary.scalar('learning rate', rate)
logits = LeNet(x)
accuracy_operation = # defined accuracy_operation
accuracy_summary = tf.summary.scalar('validation accuracy', accuracy_operation)
saver = tf.train.Saver()
summary = tf.summary.merge_all()
writer = tf.summary.FileWriter('./summary', tf.get_default_graph())
def evaluate(X_data, y_data):
# Return accuracy with X_data, y_data
return accuracy
with tf.Session() as sess:
saver.restore(sess, './lenet')
for i in range(EPOCHS):
X_train, y_train = shuffle(X_train, y_train)
for offset in range(0, len(X_train), BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
batch_x = sess.run(augment(batch_x))
# Run the training operation, update learning rate
validation_accuracy = evaluate(X_valid, y_valid)
writer.add_summary(sess.run(summary, feed_dict = {x: X_valid, y: y_valid, prob: 1., rate: alpha}))
我省略了我确定不会导致问题的部件。我知道哪些部分很好,因为剧本之前没有给出任何麻烦。添加某些部分(主要是摘要编写器操作)后,脚本突然显示Killed
并在执行一定数量的训练操作后退出。我怀疑这是由于内存泄漏,但我无法检测到它。
答案 0 :(得分:1)
几天前我遇到了类似的问题。在我的情况下,我有一些操作在计算上非常沉重,我后来才知道。一旦我缩小了张量的大小,消息就消失了,我的代码也运行了。 我无法确切地告诉你案件中问题的原因是什么,但是根据我的经验和你说的话(只有在添加摘要时出现错误)我建议你调整你的X_valid的大小, Y_valid。可能只是作者无法应对太多的数据......