tensorflow如何管理多个会话的GPU内存?

时间:2016-12-22 08:42:51

标签: tensorflow

[tensorflow.__version__'0.12.head']

我有5个不同结构和参数的网络。我想将它们部署在带有GPU的服务器上。根据我的理解,在批量处理数据时效率更高。但我不知道如何确定批量大小。

所以我尝试了以下内容:

net_output = create_graph()

sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())

batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})

正在运行sess1.run会收到类似

的警告
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 1.51GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

尽管sess1太大,但batch_size 64似乎可以batch_size 64运行。

之后服务器收到使用其他四个网络的请求。所以我用四个会话加载了网络:

# assume 5 networks have the same structure
sess2 = tf.Session()
sess2.run(tf.global_variables_initializer())
sess3 = tf.Session()
sess3.run(tf.global_variables_initializer())
sess4 = tf.Session()
sess4.run(tf.global_variables_initializer())
sess5 = tf.Session()
sess5.run(tf.global_variables_initializer())

然后我又跑了第一个网络:

batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})

这次我得到Exception

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape...

所以我决定卸载网络,

sess2.close()
sess3.close()
sess4.close()
sess5.close()

sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})

然而,当我关闭其他四个会话后调用ResourceExhaustedError时,我仍然遇到异常sess1.run

我的问题是:

  • Session.close是否会释放GPU内存?启动和关闭sess1.run后,为什么sess[2-5]无法转发?

  • 有没有更好的方法在带GPU的服务器上部署多个网络?

我想也许我可以启动5个进程,每个进程一个,并调用

gpu_options = tf.ConfigProto(gpu_options=tf.GPUOptions(
    per_process_gpu_memory_fraction=0.2))
sess = tf.Session(config=gpu_options)

在每个过程中。但是,与per_process_gpu_memory_fraction=1.0相比,我担心有效的批量大小会变小。

[更新]

我运行以下代码

import tensorflow as tf
import tensorflow.contrib.slim as slim

def create_graph(x):
    dummy = tf.zeros([100, 100, 100, 5]) 
    return slim.repeat(x, 3, slim.conv2d, 87, [5, 5])

batch_size = 64
x = tf.zeros([batch_size, 256, 256, 3])
output = create_graph(x)

sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())
sess1.run(output)

num_other_sessions = 50
other_sessions = []
for _ in range(num_other_sessions):
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())
    other_sessions.append(sess)

try:
    sess1.run(output)
except Exception as e:
    print(e)

for sess in other_sessions:
    sess.close()

# If I run the following two lines, the bottom sess1.run(output) could be run without error.
# del sess
# del other_sessions

try:
    sess1.run(output)
except Exception as e:
    print(e)

当我第一次打电话给sess1.run(output)时,我只收到警告

W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB.

在我启动其他50个会话后,调用sess1.run(output)会产生异常ResourceExhaustedError。我尝试关闭这些会议,但没有帮助。

日志消息的一部分:

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.228
pciBusID 0000:03:00.0
Total memory: 5.93GiB
Free memory: 5.84GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)

...

I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096):  Total Chunks: 1, Chunks in use: 0 7.0KiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384):     Total Chunks: 1, Chunks in use: 0 25.5KiB allocated for chunks. 25.5KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288):    Total Chunks: 1, Chunks in use: 0 586.2KiB allocated for chunks. 384.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 739.2KiB was 512.0KiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:666]   Size: 586.2KiB | Requested Size: 384.0KiB | in_use: 0, prev:   Size: 25.5KiB | Requested Size: 25.5KiB | in_use: 1, next:   Size: 739.2KiB | Requested Size: 739.2KiB | in_use: 1
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780900 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780c00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781800 of size 256

...

I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x130eb8c100 of size 7168
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310cf5a00 of size 26112
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310d0f200 of size 600320
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 255 Chunks of size 256 totalling 63.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 303 Chunks of size 512 totalling 151.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 768 totalling 2.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 1280 totalling 60.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 1536 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 49 Chunks of size 26112 totalling 1.22MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 49920 totalling 48.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50944 totalling 49.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 98 Chunks of size 756992 totalling 70.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 783104 totalling 2.99MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50331648 totalling 48.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1459617792 totalling 2.72GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2904102656 totalling 2.70GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.54GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                  5953290240
InUse:                  5952656640
MaxInUse:               5953264128
NumAllocs:                    1259
MaxAllocSize:           2904102656

W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************xxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 739.2KiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[87,87,5,5]

1 个答案:

答案 0 :(得分:1)

TensorFlow中的张量占据了大部分内存。它们中的大多数是不可变的,并且存在于session.run()调用中。其他的,例如变量,队列,可以活得更久。根据您使用的会话类型,DirectSession会在关闭会话时释放其变量。

因此,从您的代码开始,打开其他会话,运行全局初始化程序以及关闭它们会对内存使用产生影响,这有点奇怪。也许你可以简化和分享完整的repro案例,这样我们就可以全面了解。