我有一个利用利用Keras和TensorFlow构建的对象检测模型的应用程序。我已经安装了tensorflow-gpu。当应用程序运行时,我看不到我的GPU被按预期使用。因此,基于this post/answer,我试图验证TensorFlow可以使用我的GPU,但是它给出了一个错误,指示未为我的GPU启用CUDA(即The requested device appears to be a GPU, but CUDA is not enabled.
):
$ python
Python 3.7.4 (default, Jul 9 2019, 15:11:16)
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> with tf.device('/gpu:0'):
... a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
... b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
... c = tf.matmul(a, b)
...
>>> with tf.Session() as sess:
... print(sess.run(c))
...
2019-07-24 20:31:37.175391: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-24 20:31:37.204704: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-07-24 20:31:37.207126: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560dd3d22490 executing computations on platform Host. Devices:
2019-07-24 20:31:37.207196: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
self._extend_graph()
File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
[[MatMul]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <stdin>:4) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
[[MatMul]]
Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
a (defined at <stdin>:2)
b (defined at <stdin>:3)
>>>
不过,据我所知,CUDA工具包已正确安装并且在GPU上启用了CUDA:
$ ll /usr/local/cuda
lrwxrwxrwx 1 root root 9 Jun 12 15:59 /usr/local/cuda -> cuda-10.1/
$ nvidia-smi
Wed Jul 24 16:14:15 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... On | 00000000:01:00.0 Off | N/A |
| N/A 43C P0 N/A / N/A | 830MiB / 4042MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
我的虚拟环境的已安装软件包:
$ pip list
Package Version
-------------------- --------
absl-py 0.7.1
arrow 0.14.2
astor 0.8.0
Cython 0.29.12
gast 0.2.2
google-pasta 0.1.7
grpcio 1.22.0
h5py 2.9.0
imutils 0.5.2
joblib 0.13.2
Keras 2.2.4
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
keras-resnet 0.2.0
keras-retinanet 0.5.1
Markdown 3.1.1
numpy 1.16.4
opencv-python 4.1.0.25
Pillow 6.1.0
pip 19.2.1
progressbar2 3.42.0
protobuf 3.9.0
python-dateutil 2.8.0
python-utils 2.3.0
PyYAML 5.1.1
scikit-learn 0.21.2
scipy 1.3.0
setuptools 41.0.1
six 1.12.0
SQLAlchemy 1.3.6
SQLAlchemy-Utils 0.34.1
tensorboard 1.14.0
tensorflow 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
Werkzeug 0.15.5
wget 3.2
wheel 0.33.4
wrapt 1.11.2
我的系统是Ubuntu 18.04.2 LTS。