Question

我在ubuntu 14.04上使用了与cuda8的tensorflow 我的CPU：GeForce GT 740M 我是GPU的新手有时候，在我在gpu上多次运行相同的脚本之后，我会得到一个内存错误，这会在下次重启时消失。感谢您与我分享您的专业知识。我真的不知道如何解决这个问题。

以下是错误消息：

        I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910]
        successful NUMA node read from SysFS had negative value   (-1), but there must be at least one NUMA node, so returning NUMA node  zero
        I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]                 
        Found device 0 with properties: 
        name: GeForce GT 740M
        major: 3 minor: 5 memoryClockRate (GHz) 1.0325
        pciBusID 0000:01:00.0
        Total memory: 1.96GiB
        Free memory: 118.75MiB
        I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
        I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
        I tensorflow/core/common_runtime/gpu/gpu_device.cc:975]          
        Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:01:00.0)
        E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 118.75M (124518400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
        E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
        E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
        F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
        Aborted (core dumped)

Answer 1

您可能遇到此问题的原因有很多。

检查您是否还使用GPU运行X服务器，因为它从一开始就崩溃了。请与nvidia-smi联系，了解您实际需要使用多少空间。
确保您拥有适用于您正在运行的张量流的CUDA驱动程序和工具包版本（367.35或更新版本和工具包8.0）
您的卡是否受支持？（我认为它应该可行，但是nvidia喜欢偷偷摸摸地支持旧硬件，他们将你锁定，以此作为购买新的nvidia GPU的方式）。仔细检查后，支持您的卡。需要CUDA计算＆gt; = 3.0
您可以使用tensorflow调试器调试代码。
最后但并非最不重要，因为评论表明，在您的软件结束后，您的GPU资源似乎没有被释放。确保您终止进程，因为GPU将在程序调用exit（）后释放资源。

CUDA_ERROR_OUT_OF_MEMORY ubuntu 14.04 cuda8

1 个答案: