Question

最近我正在尝试为视频分类任务训练一个深度神经网络。我使用的DNN架构是带有tensorflow（端到端培训）的Inception v3和LSTM结合使用。输入张量的形状为[batch_size，time_step，img_height，img_width，channel]（对于lstm，time_step为40）。问题与gpu内存管理有关。除非我将批大小减小到1或2，否则gpu的分配将总是耗尽。并且小批大小会极大地影响梯度的收敛。那么有人可以帮助解决这个问题吗？我想知道是否在gpu上有一些内存分配技巧，或者我的网络组织还有其他问题。

这是我的会话代码。

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True

with tf.Session(config=config) as sess:
    sess.run(model.init)
    for step in range(0, n_epochs):
        trainX, trainY = sess.run(train_next)
        sess.run(model.train_step, feed_dict={model.X: trainX, model.Y: trainY})
        print('...')

这是将批处理大小定义为8时的错误日志。

Limit:                  6753043743
InUse:                  5218157056
MaxInUse:               5218209536
NumAllocs:                     967
MaxAllocSize:           1770209280

2019-04-10 15:06:37.834046: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *****________*_***************_****************************____**********************************___
2019-04-10 15:06:37.834464: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at conv_ops.cc:446 : Resource exhausted: OOM when allocating tensor with shape[320,192,71,71] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[320,192,71,71] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node InceptionV3/InceptionV3/Conv2d_4a_3x3/Conv2D ]]

在Tensorflow中共同训练Inception V3和LSTM时如何管理GPU内存？

0 个答案: