[tensorflow.__version__
:'0.12.head'
]
我有5个不同结构和参数的网络。我想将它们部署在带有GPU的服务器上。根据我的理解,在批量处理数据时效率更高。但我不知道如何确定批量大小。
所以我尝试了以下内容:
net_output = create_graph()
sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())
batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
正在运行sess1.run
会收到类似
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 1.51GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
尽管sess1
太大,但batch_size 64
似乎可以batch_size 64
运行。
之后服务器收到使用其他四个网络的请求。所以我用四个会话加载了网络:
# assume 5 networks have the same structure
sess2 = tf.Session()
sess2.run(tf.global_variables_initializer())
sess3 = tf.Session()
sess3.run(tf.global_variables_initializer())
sess4 = tf.Session()
sess4.run(tf.global_variables_initializer())
sess5 = tf.Session()
sess5.run(tf.global_variables_initializer())
然后我又跑了第一个网络:
batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
这次我得到Exception
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape...
所以我决定卸载网络,
sess2.close()
sess3.close()
sess4.close()
sess5.close()
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
然而,当我关闭其他四个会话后调用ResourceExhaustedError
时,我仍然遇到异常sess1.run
。
我的问题是:
Session.close
是否会释放GPU内存?启动和关闭sess1.run
后,为什么sess[2-5]
无法转发?
有没有更好的方法在带GPU的服务器上部署多个网络?
我想也许我可以启动5个进程,每个进程一个,并调用
gpu_options = tf.ConfigProto(gpu_options=tf.GPUOptions(
per_process_gpu_memory_fraction=0.2))
sess = tf.Session(config=gpu_options)
在每个过程中。但是,与per_process_gpu_memory_fraction=1.0
相比,我担心有效的批量大小会变小。
[更新]
我运行以下代码
import tensorflow as tf
import tensorflow.contrib.slim as slim
def create_graph(x):
dummy = tf.zeros([100, 100, 100, 5])
return slim.repeat(x, 3, slim.conv2d, 87, [5, 5])
batch_size = 64
x = tf.zeros([batch_size, 256, 256, 3])
output = create_graph(x)
sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())
sess1.run(output)
num_other_sessions = 50
other_sessions = []
for _ in range(num_other_sessions):
sess = tf.Session()
sess.run(tf.global_variables_initializer())
other_sessions.append(sess)
try:
sess1.run(output)
except Exception as e:
print(e)
for sess in other_sessions:
sess.close()
# If I run the following two lines, the bottom sess1.run(output) could be run without error.
# del sess
# del other_sessions
try:
sess1.run(output)
except Exception as e:
print(e)
当我第一次打电话给sess1.run(output)
时,我只收到警告
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB.
在我启动其他50
个会话后,调用sess1.run(output)
会产生异常ResourceExhaustedError
。我尝试关闭这些会议,但没有帮助。
日志消息的一部分:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.228
pciBusID 0000:03:00.0
Total memory: 5.93GiB
Free memory: 5.84GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
...
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 1, Chunks in use: 0 7.0KiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 1, Chunks in use: 0 25.5KiB allocated for chunks. 25.5KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 1, Chunks in use: 0 586.2KiB allocated for chunks. 384.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 739.2KiB was 512.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:666] Size: 586.2KiB | Requested Size: 384.0KiB | in_use: 0, prev: Size: 25.5KiB | Requested Size: 25.5KiB | in_use: 1, next: Size: 739.2KiB | Requested Size: 739.2KiB | in_use: 1
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780900 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780c00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781800 of size 256
...
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x130eb8c100 of size 7168
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310cf5a00 of size 26112
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310d0f200 of size 600320
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 255 Chunks of size 256 totalling 63.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 303 Chunks of size 512 totalling 151.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 768 totalling 2.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 1280 totalling 60.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 1536 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 49 Chunks of size 26112 totalling 1.22MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 49920 totalling 48.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50944 totalling 49.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 98 Chunks of size 756992 totalling 70.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 783104 totalling 2.99MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50331648 totalling 48.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1459617792 totalling 2.72GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2904102656 totalling 2.70GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.54GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5953290240
InUse: 5952656640
MaxInUse: 5953264128
NumAllocs: 1259
MaxAllocSize: 2904102656
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************xxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 739.2KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[87,87,5,5]
答案 0 :(得分:1)
TensorFlow中的张量占据了大部分内存。它们中的大多数是不可变的,并且存在于session.run()调用中。其他的,例如变量,队列,可以活得更久。根据您使用的会话类型,DirectSession会在关闭会话时释放其变量。
因此,从您的代码开始,打开其他会话,运行全局初始化程序以及关闭它们会对内存使用产生影响,这有点奇怪。也许你可以简化和分享完整的repro案例,这样我们就可以全面了解。