Question

我注意到最近的模型警告说无法分配2.37G的内存：

W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.37GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

但是我的GPU的运行率接近100％（在这种情况下，与大型号相比，输入量很小）。

如果我正确阅读，我认为我的模型并不完全适合GPU内存。然而，由于GPU在100％am运行，我还假设tensorflow是异步地智能地将图元素交换进出GPU内存？

我很想知道那里发生了什么。

Answer 1

要知道幕后发生了什么，请将此代码添加到您的运行功能中：

run_metadata = tf.RunMetadata()
sess = tf.Session(config=config) 
sess.run(train_step,
           feed_dict={x: batch_xs,
                      y_: batch_ys},
            options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
           run_metadata=run_metadata)
trace = timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.ctf.json', 'w') as trace_file:
   trace_file.write(trace.generate_chrome_trace_format())

然后从chrome://timeline界面打开生成的timeline.ctf.json，您将看到幕后发生了什么。

很可能是交换GPU内存。

当模型超出内存容量时，tensorflow是否会处理GPU内存交换？

1 个答案: