在尝试运行我的3D卷积神经网络时,我收到以下错误。可能是什么原因?
ResourceExhaustedError(参见上面的回溯):分配时的OOM 张量形状[54080,1024] [[节点:Variable_10 / Adam / Assign = 分配[T = DT_FLOAT,_ class = [" loc:@ Variable_10"],use_locking = true, validate_shape = TRUE, _device =" / job:localhost / replica:0 / task:0 / gpu:0"](Variable_10 / Adam,zeros_4)]]
这是我用过的代码:
import tensorflow as tf
import numpy as np
IMG_SIZE_PX = 50
SLICE_COUNT = 20
n_classes = 2
batch_size = 10
x = tf.placeholder('float')
y = tf.placeholder('float')
keep_rate = 0.8
def conv3d(x, W):
return tf.nn.conv3d(x, W, strides=[1,1,1,1,1], padding='SAME')
def maxpool3d(x):
return tf.nn.max_pool3d(x, ksize=[1,2,2,2,1], strides=[1,2,2,2,1], padding='SAME')
def convolutional_neural_network(x):
weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,3,1,32])),
'W_conv2':tf.Variable(tf.random_normal([3,3,3,32,64])),
'W_fc':tf.Variable(tf.random_normal([54080,1024])),
'out':tf.Variable(tf.random_normal([1024, n_classes]))}
biases = {'b_conv1':tf.Variable(tf.random_normal([32])),
'b_conv2':tf.Variable(tf.random_normal([64])),
'b_fc':tf.Variable(tf.random_normal([1024])),
'out':tf.Variable(tf.random_normal([n_classes]))}
x = tf.reshape(x, shape=[-1, IMG_SIZE_PX, IMG_SIZE_PX, SLICE_COUNT, 1])
conv1 = tf.nn.relu(conv3d(x, weights['W_conv1']) + biases['b_conv1'])
conv1 = maxpool3d(conv1)
conv2 = tf.nn.relu(conv3d(conv1, weights['W_conv2']) + biases['b_conv2'])
conv2 = maxpool3d(conv2)
fc = tf.reshape(conv2,[-1, 54080])
fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc'])
fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, weights['out'])+biases['out']
return output
much_data = np.load('muchdata-50-50-20.npy')
# If you are working with the basic sample data, use maybe 2 instead of 100 here... you don't have enough data to really do this
train_data = much_data[:-100]
validation_data = much_data[-100:]
def train_neural_network(x):
prediction = convolutional_neural_network(x)
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y) )
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)
hm_epochs = 10
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
successful_runs = 0
total_runs = 0
for epoch in range(hm_epochs):
epoch_loss = 0
for data in train_data:
total_runs += 1
try:
X = data[0]
Y = data[1]
_, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})
epoch_loss += c
successful_runs += 1
except Exception as e:
# I am passing for the sake of notebook space, but we are getting 1 shaping issue from one
# input tensor. Not sure why, will have to look into it. Guessing it's
# one of the depths that doesn't come to 20.
pass
#print(str(e))
print('Epoch', epoch+1, 'completed out of',hm_epochs,'loss:',epoch_loss)
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
print('Done. Finishing accuracy:')
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
print('fitment percent:',successful_runs/total_runs)
train_neural_network(x)
我使用tensorflow-gpu版本运行它。我正在使用GTX970M并安装了CUDA并正确导入了cudnn文件。运行最后一个命令时,我收到以下错误。请帮助!
答案 0 :(得分:0)
由于某些原因,你的内存不足。 可能是您有一些使用GPU的应用程序(例如,另一个张量流会话仍处于活动状态)。如果不是这样,请检查。 (您可以使用nvidia-smi来监控)。
如果不是这样,那主要是因为您的型号尺寸和GPU内存的大小。您可以做的是尝试在CPU模式下启动它,使用tf.Variables列出所有变量,计算它代表的内存量,并查看它是否适合您的GPU。
在你这样做之前,我没有提供更多的建议。