CUDA_ERROR_OUT_OF_MEMORY:内存不足(不在训练期间)

时间:2021-01-11 04:15:25

标签: python c++ tensorflow

我在 Windows 10 上使用 TensorFlow 2.3.0 和 cuda 10.1 和 CUDNN 7.6.5 已经有一段时间了。

Driver API nvidia-smi
Thu Jan  7 15:50:14 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.09       Driver Version: 461.09       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   57C    P8     8W /  N/A |     92MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Runtime API nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243
GPU: NVIDIA GeForce GTX 1060 with Max-Q Design 

我已经能够很好地训练 Tensorflow 模型并运行推理。几天前我得到了一个 “CUDA_ERROR_OUT_OF_MEMORY: out of memory”用于仅在我之前可以运行推理的模型上运行推理。运行推理的代码也没有改变。是否有其他进程正在填充 CUDA 内存?我已经尝试删除 CUDA 和 cuDNN 并重新安装。

Here are the log of the error when I run inference

我还运行了 cuda-memcheck 来检查是否有任何泄漏。

Here are the logs of cuda-memcheck --leak-check full

非常感谢任何帮助!

0 个答案:

没有答案