当我第一次启动实例时,所有驱动程序均已正确安装,然后登录docker,运行新的tensorflow容器并继续执行我的任务。但是当我关闭实例并在下次重新启动它时,会遇到以下问题。
Welcome to the NVIDIA GPU Cloud Virtual Machine. This environment is provided
to enable you to easily run the Deep Learning containers from the NGC Registry.
All of the documentation for how to use NGC and this VM are found at
http://docs.nvidia.com/ngc/index.html
NVIDIA GPU Cloud (NGC) is an optimized software environment that requires the latest NVIDIA drivers to operate. If you do not download the NVIDIA drivers at this time, your instance will shut down. Would you like to download the latest NVIDIA drivers so NGC can finish installing? (Y/n)
y
Copying gs://nvidia-ngc-drivers-us-public/TESLA/shim/NVIDIA-Linux-x86_64-384.125-18.04.0-shim.run...
\ [1 files][118.5 MiB/118.5 MiB]
Operation completed over 1 objects/118.5 MiB.
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.125............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ERROR: A DKMS kernel module with version 384.125 is already installed.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find
suggestions on fixing installation problems in the README available on the Linux driver download page at
www.nvidia.com.
Error! DKMS tree already contains: nvidia-384.125
You cannot add the same module/version combo more than once.
Enabling persistence mode...
nvidia-persistenced-init/README
nvidia-persistenced-init/install.sh
nvidia-persistenced-init/systemd/nvidia-persistenced.service.template
nvidia-persistenced-init/sysv/nvidia-persistenced.template
nvidia-persistenced-init/upstart/nvidia-persistenced.conf.template
Checking for common requirements...
sed found in PATH? Yes
useradd found in PATH? Yes
userdel found in PATH? Yes
id found in PATH? Yes
Common installation/uninstallation supported
Removing previous sample System V script... done.
Creating sample System V script... done.
Removing previous sample systemd service file... done.
Creating sample systemd service file... done.
Removing previous sample Upstart service file... done.
Creating sample Upstart service file... done.
Checking for systemd requirements...
/usr/lib/systemd/system directory exists? No
/etc/systemd/system directory exists? Yes
systemctl found in PATH? Yes
systemd installation/uninstallation supported
Installation parameters:
User : nvidia-persistenced
Group : nvidia-persistenced
systemd service installation path : /etc/systemd/system
User 'nvidia-persistenced' already exists, skipping useradd...
Error: User 'nvidia-persistenced' is not in primary group 'nvidia-persistenced'.
Aborting.
Cleaning up.
此错误消息之后,即使我登录到docker,也无法启动已经存在(或运行新的)的tensorflow容器。
我对Google Cloud上的Nvidia NGC图像还很陌生。有人可以帮我解决这个问题。先感谢您。 https://i.stack.imgur.com/cxx62.png