我正在使用TensorFlow构建一个简单的前馈神经网络,我正在使用可变大小的批处理。我没有使用GPU,我有8GB RAM,并且运行在Python 3.5.2上。
我的问题是我有一些太大的批次并且产生典型的内存不足错误。我明白,这不是问题。但是,如果我使用带有TF后端的Keras,我就没有这个问题。我已经建立了一个示例(具有固定大小的批次),这说明了这一点。
我的实施有问题吗?我应该如何处理太大的批次?
import numpy as np
import tensorflow as tf
n_observations = 100000
n_input = 6
batch_size = 20000
X = np.random.rand(n_observations, n_input)
Y = X[:,0] ** 3 + X[:,1] ** 2 + X[:,2] + X[:,3] + X[:,4] + X[:,5]+ np.random.rand(n_observations)
n_hidden = 16
n_output = 1
def generatebatch(n_observations, batch_size):
for batch_i in range(n_observations // batch_size):
start = batch_i*batch_size
end = start + batch_size
batch_xs = X[start:end, :]
batch_ys = Y[start:end]
yield batch_xs, batch_ys
with tf.Session() as sess:
# placeholders for input and target
net_input = tf.placeholder(tf.float32, [None, n_input])
y_true = tf.placeholder(tf.float32)
# Hidden Layer
W1 = tf.Variable(tf.random_normal([n_input, n_hidden]))
b1 = tf.Variable(tf.random_normal([n_hidden]))
net_output1 = tf.nn.relu(tf.matmul(net_input, W1) + b1)
# Yet another Hidden Layer
yaW1 = tf.Variable(tf.random_normal([n_hidden, n_hidden]))
yab1 = tf.Variable(tf.random_normal([n_hidden]))
yanet_output1 = tf.nn.relu(tf.matmul(net_output1, yaW1) + yab1)
# Output Layer
W2 = tf.Variable(tf.random_normal([n_hidden, n_output]))
b2 = tf.Variable(tf.random_normal([n_output]))
net_output2 = tf.nn.relu(tf.matmul(yanet_output1, W2) + b2)
# The loss function
cost = tf.reduce_mean(tf.pow(y_true - net_output2, 2))
# Configure the optimizer
optimizer = tf.train.AdamOptimizer().minimize(cost)
# Initialize variables
sess.run(tf.global_variables_initializer())
n_epochs = 100
for epoch_i in range(n_epochs):
batchloss = []
for batch_xs, batch_ys in generatebatch(n_observations, batch_size):
_, loss = sess.run(
[optimizer, cost],
feed_dict={
net_input: batch_xs,
y_true: batch_ys
})
batchloss.append(loss)
print(np.mean(batchloss))
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
import logging
#just to hide the deprecation warnings
logging.basicConfig(level=logging.CRITICAL)
n_input = 6
n_observations = 100000
n_hidden = 16
n_epochs = 10
batch_size = 35000
# input data
X = np.random.rand(n_observations, n_input)
Y = X[:,0] ** 3 + X[:,1] ** 2 + X[:,2] + X[:,3] + X[:,4] + X[:,5]+ np.random.rand(n_observations)
# create and fit Multilayer Perceptron model
model = Sequential()
model.add(Dense(n_hidden, input_dim=n_input, activation='relu'))
model.add(Dense(n_hidden, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='adam')
model.fit(X, Y, nb_epoch=n_epochs, batch_size=batch_size, verbose=1)
答案 0 :(得分:3)
您的Y
形状不正确,可能导致张量流不正确地推断张量的形状(例如(20000,20000)而不是(20000,6)),消耗大量内存。
Y = np.reshape(Y, [n_observations, 1])
因此,您的占位符应具有相同的形状:
net_input = tf.placeholder(tf.float32, shape=[None, n_input])
y_true = tf.placeholder(tf.float32, shape=[None, 1])
答案 1 :(得分:0)
我认为Keras正在覆盖TensorFlow中的默认配置选项。您的原生TensorFlow代码可以在GPU上以较小的批量大小(例如10k,15k)运行。但是使用默认配置,它会假设您需要GPU优势并且因为没有足够的GPU内存而发生OOM问题。
当您将默认行为更改为CPU时,您的TensorFlow示例正常工作(如问题中所示)。以下是我改变的行:
config = tf.ConfigProto(
log_device_placement=True, allow_soft_placement=True
)
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess, \
tf.device('cpu:0'): # placeholders for input and target