nvidia-smi
显示如下,指示GPU0已使用3.77GB,但未列出GPU0的进程:
(base) ~/.../fast-autoaugment$ nvidia-smi
Fri Dec 20 13:48:12 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:03:00.0 Off | N/A |
| 23% 34C P8 9W / 250W | 3771MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:84:00.0 On | N/A |
| 38% 62C P8 24W / 250W | 2295MiB / 12188MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1910 G /usr/lib/xorg/Xorg 105MiB |
| 1 2027 G /usr/bin/gnome-shell 51MiB |
| 1 3086 G /usr/lib/xorg/Xorg 1270MiB |
| 1 3237 G /usr/bin/gnome-shell 412MiB |
| 1 30593 G /proc/self/exe 286MiB |
| 1 31849 G ...quest-channel-token=4371017438329004833 164MiB |
+-----------------------------------------------------------------------------+
类似地,nvtop
显示了相同的GPU RAM利用率,但是其列出的进程显示了TYPE=Compute
,如果您试图杀死PID,则显示错误:
(base) ~/.../fast-autoaugment$ kill 27761
bash: kill: (27761) - No such process
如何回收看似鬼进程占用的GPU RAM?
答案 0 :(得分:0)
使用以下命令深入了解占用GPU RAM的幻影进程:
sudo fuser -v /dev/nvidia*
在我的情况下,输出为:
(base) ~/.../fast-autoaugment$ sudo fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: shitals 517 F.... nvtop
root 1910 F...m Xorg
gdm 2027 F.... gnome-shell
root 3086 F...m Xorg
shitals 3237 F.... gnome-shell
shitals 27808 F...m python
shitals 27809 F...m python
shitals 27813 F...m python
shitals 27814 F...m python
shitals 28091 F...m python
shitals 28092 F...m python
shitals 28096 F...m python
这显示了nvidia-smi和nvtop无法显示的进程。我杀死所有python
进程后,就释放了GPU RAM。
另一种尝试的方法是使用以下命令重置GPU:
sudo nvidia-smi --gpu-reset -i 0