Question

我试图使用Keras对（1000000,12,1）数据进行机器学习，我同时尝试了Cpu（AMD R5-2600@3.88GHz）和GPU（RX580 @ 1411MHz）。

但是Cpu和GPU之间的速度完全相同。

我认为我已经正确安装了ROCm2.6和tensorflow-rocm1.13.3，程序可以正常运行了。

我做错了什么吗，或者我可以做些什么来使GPU训练更快？

我正在使用Tensorflow后端，并在终端上运行它，并安装了anaconda和python3.5。

Using TensorFlow backend.
2019-07-13 03:11:15.890156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-07-13 03:11:15.920477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties: 
name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.411
pciBusID 0000:0a:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB
2019-07-13 03:11:15.920532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-07-13 03:11:15.920570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-13 03:11:15.920591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059]      0 
2019-07-13 03:11:15.920610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0:   N 
2019-07-13 03:11:15.920696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Ellesmere [Radeon RX 470/480/570/570X/580/580X], pci bus id: 0000:0a:00.0)
['/job:localhost/replica:0/task:0/device:GPU:0']
WARNING:tensorflow:From /home/kenchou/anaconda3/envs/mlgpu/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/kenchou/anaconda3/envs/mlgpu/lib/python3.5/site-packages/tensorflow/python/keras/utils/losses_utils.py:170: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
(1000000, 12) (1000000,)
WARNING:tensorflow:From /home/kenchou/anaconda3/envs/mlgpu/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 700000 samples, validate on 300000 samples

情况看起来还不错，但是速度很慢

使用Keras的RX580 ROCm，但速度较慢（非常类似于使用cpu）

0 个答案: