GPU RAM已占用,但没有PID

时间:2019-12-20 21:59:54

标签: gpu nvidia ram

nvidia-smi显示如下,指示GPU0已使用3.77GB,但未列出GPU0的进程:

(base) ~/.../fast-autoaugment$ nvidia-smi
Fri Dec 20 13:48:12 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:03:00.0 Off |                  N/A |
| 23%   34C    P8     9W / 250W |   3771MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:84:00.0  On |                  N/A |
| 38%   62C    P8    24W / 250W |   2295MiB / 12188MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1910      G   /usr/lib/xorg/Xorg                           105MiB |
|    1      2027      G   /usr/bin/gnome-shell                          51MiB |
|    1      3086      G   /usr/lib/xorg/Xorg                          1270MiB |
|    1      3237      G   /usr/bin/gnome-shell                         412MiB |
|    1     30593      G   /proc/self/exe                               286MiB |
|    1     31849      G   ...quest-channel-token=4371017438329004833   164MiB |
+-----------------------------------------------------------------------------+

类似地,nvtop显示了相同的GPU RAM利用率,但是其列出的进程显示了TYPE=Compute,如果您试图杀死PID,则显示错误:

(base) ~/.../fast-autoaugment$ kill 27761
bash: kill: (27761) - No such process

如何回收看似鬼进程占用的GPU RAM?

1 个答案:

答案 0 :(得分:0)

使用以下命令深入了解占用GPU RAM的幻影进程:

sudo fuser -v /dev/nvidia*

在我的情况下,输出为:

(base) ~/.../fast-autoaugment$ sudo fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        shitals     517 F.... nvtop
                     root       1910 F...m Xorg
                     gdm        2027 F.... gnome-shell
                     root       3086 F...m Xorg
                     shitals    3237 F.... gnome-shell
                     shitals   27808 F...m python
                     shitals   27809 F...m python
                     shitals   27813 F...m python
                     shitals   27814 F...m python
                     shitals   28091 F...m python
                     shitals   28092 F...m python
                     shitals   28096 F...m python

这显示了nvidia-smi和nvtop无法显示的进程。我杀死所有python进程后,就释放了GPU RAM。

另一种尝试的方法是使用以下命令重置GPU:

sudo nvidia-smi --gpu-reset -i 0