我在Ubuntu 14.04上运行Nvidia GTX 1080。我正在尝试使用tensorflow 1.0.1实现卷积自动编码器,但该程序似乎根本不使用GPU。我使用watch nvidia-smi
和htop
对此进行了验证。运行程序后的输出如下:
1 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
2 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
3 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
4 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
5 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
6 Extracting MNIST_data/train-images-idx3-ubyte.gz
7 Extracting MNIST_data/train-labels-idx1-ubyte.gz
8 Extracting MNIST_data/t10k-images-idx3-ubyte.gz
9 Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
10 getting into solving the reconstruction loss
11 Dimension of z i.e. our latent vector is [None, 100]
12 Dimension of the output of the decoder is [100, 28, 28, 1]
13 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
14 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are availab le on your machine and could speed up CPU computations.
15 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are availab le on your machine and could speed up CPU computations.
16 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
17 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
18 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
19 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
20 name: GeForce GTX 1080
21 major: 6 minor: 1 memoryClockRate (GHz) 1.7335
22 pciBusID 0000:0a:00.0
23 Total memory: 7.92GiB
24 Free memory: 7.81GiB
25 W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x34bccc0
26 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
27 name: GeForce GTX 1080
28 major: 6 minor: 1 memoryClockRate (GHz) 1.7335
29 pciBusID 0000:09:00.0
30 Total memory: 7.92GiB
31 Free memory: 7.81GiB
32 W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x34c0940
33 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
34 name: GeForce GTX 1080
35 major: 6 minor: 1 memoryClockRate (GHz) 1.7335
36 pciBusID 0000:06:00.0
37 Total memory: 7.92GiB
38 Free memory: 7.81GiB
39 W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x34c45c0
40 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
41 name: GeForce GTX 1080
42 major: 6 minor: 1 memoryClockRate (GHz) 1.7335
43 pciBusID 0000:05:00.0
44 Total memory: 7.92GiB
45 Free memory: 7.81GiB
46 I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
47 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y Y Y
48 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y Y Y
49 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: Y Y Y Y
50 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: Y Y Y Y
51 I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus i d: 0000:0a:00.0)
52 I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus i d: 0000:09:00.0)
53 I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX 1080, pci bus i d: 0000:06:00.0)
54 I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX 1080, pci bus i d: 0000:05:00.0)
我的代码中是否存在问题,我还尝试在构建图表之前使用with tf.device("/gpu:0"):
指定它以使用特定设备。如果需要任何进一步的信息,请告诉我。
编辑1 输出nvidia-smi
exx@ubuntu:~$ nvidia-smi
Wed Apr 19 20:50:07 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:05:00.0 Off | N/A |
| 38% 54C P8 12W / 180W | 7715MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:06:00.0 Off | N/A |
| 38% 55C P8 8W / 180W | 7715MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 1080 Off | 0000:09:00.0 Off | N/A |
| 36% 50C P8 8W / 180W | 7715MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 1080 Off | 0000:0A:00.0 Off | N/A |
| 35% 54C P2 41W / 180W | 7833MiB / 8113MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 24228 C python3 7713MiB |
| 1 24228 C python3 7713MiB |
| 2 24228 C python3 7713MiB |
| 3 24228 C python3 7831MiB |
+-----------------------------------------------------------------------------+
htop显示它使用了其中一个CPU核心的大约100%。我说它不使用gpu的基础是因为GPU使用率%。它显示8%,但通常为0%。
答案 0 :(得分:0)
所以你在GPU上运行,从这个角度来看,一切都是正确配置的,但速度真的很糟糕。确保你多次运行nvidia-smi以了解它的工作方式,它可能会显示100%,8%显示另一个。
从GPU获得大约80%的利用率是正常的,因为在每次运行之前将每个批次从核心内存加载到GPU会有时间丢失(很快就会出现新功能)为了改善这一点,GPU队列在TF)。
如果您从GPU中获得的性能低于约80%,那么您做错了。我想到了两种可能的常见原因:
1)您在步骤之间进行了一系列预处理,因此GPU运行速度很快,但是您在单个CPU线程上阻止了一堆非张量流工作。将其移动到自己的线程,从python Queue
2)大块数据正在CPU和GPU内存之间来回移动。如果你这样做,CPU和GPU之间的带宽可能成为瓶颈。
尝试在训练/推理批次开始和结束之间添加一些计时器,看看你是否在tensorflow操作之外花了很多时间。