Question

如果我尝试运行一个告诉您Tensorflow是否使用GPU的命令。

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

命令返回此信息，现在我无法理解是否使用GPU进行张量流。

2019-09-25 17:08:47.509729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:03:00.0
2019-09-25 17:08:47.509929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510328: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510440: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510483: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-25 17:08:47.510498: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-09-25 17:08:47.510524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-25 17:08:47.510536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-25 17:08:47.510556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
2019-09-25 17:08:47.510713: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

nvidia-smi命令返回此值。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.26                 Driver Version: 387.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:03:00.0  On |                  N/A |
| 46%   40C    P8    11W / 105W |   1240MiB /  8111MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1943      G   /usr/libexec/Xorg                            317MiB |
|    0      2047      G   /usr/bin/gnome-shell                           5MiB |
|    0      5505      G   /usr/libexec/Xorg                            109MiB |
|    0     10091      G   /usr/libexec/Xorg                            165MiB |
|    0     10952      G   ...uest-channel-token=14061294616847102337    87MiB |
+-----------------------------------------------------------------------------+

Answer 1

tensorflow有多个阶段，然后才能在GPU上运行学习内核（功能）

物理检查可用设备
检查与它们相关的库文件（CuDNN文件等）
分配所需的内存
开始该过程

您陷入了第二阶段

使用GPU运行tensorflow需要做几件事

Nvidia驱动程序
Cuda（编译器）
CuDNN库文件

如何对其进行测试

运行Nvidia-smi来测试驱动程序的可用性
运行nvcc --version来检查cuda编译器的可用性
运行import tensorflow as tf
在tensorflow中运行会话

如果在第3阶段出现错误，则是因为tensorflow找不到CuDNN文件

如果在第4阶段由于版本不兼容而出现错误，则可以在stackoverflow中检查this solution或搜索问题

如何知道张量流何时不使用GPU

有几种方法可以知道：

类似于您内部日志的日志：Cannot dlopen some GPU libraries. Skipping registering GPU devices...
可以通过nvidia-smi命令检测到它：如果您使用的是python，它将在内存消耗非常大的进程内显示一个python进程（它将消耗几乎所有可用的GPU内存）

如何从日志输出中判断Tensorflow是否在GPU上运行？

1 个答案:

tensorflow有多个阶段，然后才能在GPU上运行学习内核（功能）

使用GPU运行tensorflow需要做几件事

如何对其进行测试

如何知道张量流何时不使用GPU