我正在运行带有基础映像nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
的docker映像。使用conda安装tensorflow-gpu 1.14.0,但由于某种原因,通过conda安装不会使用GPU,甚至无法检测到Cuda。
信息:
$ nvcc --version
│# Configure environment
nvcc: NVIDIA (R) Cuda compiler driver │ENV CONDA_DIR=/opt/conda \
Copyright (c) 2005-2019 NVIDIA Corporation │ SHELL=/bin/bash \
Built on Wed_Oct_23_19:24:38_PDT_2019 │ NB_USER=$NB_USER \
Cuda compilation tools, release 10.2, V10.2.89
$ nvidia-smi
features (e.g., download as all possible file formats)
Fri Nov 6 06:42:31 2020 │ENV DEBIAN_FRONTEND noninteractive
+-----------------------------------------------------------------------------+ │RUN apt-get update \
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | │ && apt-get install -yq --no-install-recommends \
|-------------------------------+----------------------+----------------------+ │ wget \
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | │ bzip2 \
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | │ ca-certificates \
| | | MIG M. | │ sudo \
|===============================+======================+======================| │ locales \
| 0 Tesla T4 Off | 00000000:5E:00.0 Off | 0 | │ fonts-liberation \
| N/A 30C P8 15W / 70W | 0MiB / 15109MiB | 0% Default | │ run-one \
| | | N/A | │ && apt-get clean && rm -rf /var/lib/apt/lists/*
+-------------------------------+----------------------+----------------------+ │
│RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \
+-----------------------------------------------------------------------------+ │ locale-gen
| Processes: | │
| GPU GI CI PID Type Process name GPU Memory | │# Configure environment
| ID ID Usage | │ENV CONDA_DIR=/opt/conda \
|=============================================================================| │ SHELL=/bin/bash \
| No running processes found | │ NB_USER=$NB_USER \
+-----------------------------------------------------------------------------+
$ conda list
...
tensorflow 1.14.0 h4531e10_0 conda-forge │
tensorflow-base 1.14.0 py37h4531e10_0 conda-forge │# Install all OS dependencies for notebook server that starts but lacks all
tensorflow-estimator 1.14.0 py37h5ca1d4c_0 conda-forge │# features (e.g., download as all possible file formats)
tensorflow-gpu 1.14.0 h0d30ee6_0 defaults
...
输入Python环境
import tensorflow as tf
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/conda/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
tf.__version__
'1.14.0'
tf.test.is_built_with_cuda()
False
tf.test.is_gpu_available()
2020-11-06 06:53:29.829226: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2020-11-06 06:53:29.850287: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2020-11-06 06:53:29.858346: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556b4f1a0560 executing computations on platform Host. Devices:
2020-11-06 06:53:29.858397: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
False
但是,使用pip安装tensorflow-gpu效果很好(仅出于演示目的,我的要求需要使用conda)
tf.__version__
'1.14.0'
tf.test.is_built_with_cuda()
True
tf.test.is_gpu_available()
2020-11-06 07:01:14.032230: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-11-06 07:01:14.059798: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2020-11-06 07:01:14.067201: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b33b4840f0 executing computations on platform Host. Devices:
2020-11-06 07:01:14.067240: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-11-06 07:01:14.068406: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-11-06 07:01:14.089496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:5e:00.0
2020-11-06 07:01:14.089732: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-11-06 07:01:14.089856: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-11-06 07:01:14.089947: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-11-06 07:01:14.090037: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-11-06 07:01:14.090131: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-11-06 07:01:14.090220: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-11-06 07:01:14.094573: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-11-06 07:01:14.094606: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-11-06 07:01:14.219019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-06 07:01:14.219061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-11-06 07:01:14.219081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-11-06 07:01:14.223660: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b33e64f7c0 executing computations on platform CUDA. Devices:
2020-11-06 07:01:14.223692: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
False