Question

在我的tensorflow2.0b程序中，确实出现了这样的错误

    ResourceExhaustedError: OOM when allocating tensor with shape[727272703] and type int8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:TopKV2]

此程序中的许多基于GPU的操作已成功执行后，将发生错误。

我喜欢释放与这些过去的操作相关的所有GPU内存，以避免出现上述错误。我如何在tensorflow-2.0b中做到这一点？如何在程序中检查内存使用情况？

我只能使用tensorflow2.0中不再提供的tf.session（）查找相关信息

Answer 1

您可能对使用此Python 3 Bindings for the NVIDIA Management Library感兴趣。

我会尝试这样的事情：

import nvidia_smi

nvidia_smi.nvmlInit()

handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
# card id 0 hardcoded here, there is also a call to get all available card ids, so we could iterate

info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)

print("Total memory:", info.total)
print("Free memory:", info.free)
print("Used memory:", info.used)

nvidia_smi.nvmlShutdown()

如何在Tensorflow 2.0b中检查/释放GPU内存？

1 个答案: