如何从日志输出中判断Tensorflow是否在GPU上运行?

时间:2019-09-25 21:18:53

标签: python tensorflow gpu conda

如果我尝试运行一个告诉您Tensorflow是否使用GPU的命令。

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

命令返回此信息,现在我无法理解是否使用GPU进行张量流。

2019-09-25 17:08:47.509729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:03:00.0
2019-09-25 17:08:47.509929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510328: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510440: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510483: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-25 17:08:47.510498: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-09-25 17:08:47.510524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-25 17:08:47.510536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-25 17:08:47.510556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
2019-09-25 17:08:47.510713: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

nvidia-smi命令返回此值。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.26                 Driver Version: 387.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:03:00.0  On |                  N/A |
| 46%   40C    P8    11W / 105W |   1240MiB /  8111MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1943      G   /usr/libexec/Xorg                            317MiB |
|    0      2047      G   /usr/bin/gnome-shell                           5MiB |
|    0      5505      G   /usr/libexec/Xorg                            109MiB |
|    0     10091      G   /usr/libexec/Xorg                            165MiB |
|    0     10952      G   ...uest-channel-token=14061294616847102337    87MiB |
+-----------------------------------------------------------------------------+

1 个答案:

答案 0 :(得分:0)

tensorflow有多个阶段,然后才能在GPU上运行学习内核(功能)

  1. 物理检查可用设备
  2. 检查与它们相关的库文件(CuDNN文件等)
  3. 分配所需的内存
  4. 开始该过程

您陷入了第二阶段

使用GPU运行tensorflow需要做几件事

  1. Nvidia驱动程序
  2. Cuda(编译器)
  3. CuDNN库文件

如何对其进行测试

  1. 运行Nvidia-smi来测试驱动程序的可用性
  2. 运行nvcc --version来检查cuda编译器的可用性
  3. 运行import tensorflow as tf
  4. 在tensorflow中运行会话

如果在第3阶段出现错误,则是因为tensorflow找不到CuDNN文件

如果在第4阶段由于版本不兼容而出现错误,则可以在stackoverflow中检查this solution或搜索问题

如何知道张量流何时不使用GPU

有几种方法可以知道:

  • 类似于您内部日志的日志:Cannot dlopen some GPU libraries. Skipping registering GPU devices...

  • 可以通过nvidia-smi命令检测到它: 如果您使用的是python,它将在内存消耗非常大的进程内显示一个python进程(它将消耗几乎所有可用的GPU内存)