我尝试了How do I check if keras is using gpu version of tensorflow?的答案。但我认识到只有keras看不到GPU。
我重新安装了包括tensorflow-gpu,keras模块甚至CUDA在内的全部需求。
我正在使用Jupyter remote-ipython。
以下是我已安装的模块版本
...
keras 2.2.4
keras-applications 1.0.8
keras-preprocessing 1.1.0
...
tensorflow-gpu 1.14.0
...
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
我检查了以下内容:
print(device_lib.list_local_devices())
print()
from keras import backend
print(backend.tensorflow_backend._get_available_gpus())
print()
from torch import cuda
print(cuda.is_available())
print(cuda.device_count())
print(cuda.get_device_name(cuda.current_device()))
print()
和结果:
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15355337614284368930
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 5758691101165968939
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 17050701241022830982
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 15949544090620437264
physical_device_desc: "device: XLA_GPU device"
]
[]
True
2
GeForce GTX 1080 Ti
========== ADDED =========
我也在终端上遵循了How to tell if tensorflow is using gpu acceleration from inside python shell?的回答。 我尝试过:
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
和结果:
2019-08-08 16:16:57.060679: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-08 16:16:57.075040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
2019-08-08 16:16:57.076003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:0a:00.0
2019-08-08 16:16:57.076256: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-08 16:16:57.078074: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-08 16:16:57.080007: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-08 16:16:57.080436: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-08 16:16:57.083506: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-08 16:16:57.085629: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-08 16:16:57.086483: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/tink/dlgks224/conda/lib:
2019-08-08 16:16:57.086537: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-08-08 16:16:57.087195: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-08 16:16:57.117070: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2198685000 Hz
2019-08-08 16:16:57.119097: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55eab648cdc0 executing computations on platform Host. Devices:
2019-08-08 16:16:57.119231: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-08-08 16:16:57.119383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-08 16:16:57.119397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]
2019-08-08 16:16:57.483390: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55eab653adf0 executing computations on platform CUDA. Devices:
2019-08-08 16:16:57.483443: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-08-08 16:16:57.483454: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
Traceback (most recent call last):
File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
self._extend_graph()
File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.
[[MatMul]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/tink/dlgks224/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <stdin>:4) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.
[[MatMul]]
Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
a (defined at <stdin>:2)
b (defined at <stdin>:3)
答案 0 :(得分:0)
解决了!
这真是个愚蠢的问题。
该错误一直告诉我这是什么。
我再次检查了libcudnn.so.7并将其安装在错误的位置。
遇到类似错误时,请进行验证!
2019-08-08 16:16:57.086483: I tensorflow/stream_executor/platform/default/dso_loader.cc:53]
Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7:
cannot open shared object file: No such file or directory;
LD_LIBRARY_PATH: usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/tink/dlgks224/conda/lib: