在Ubuntu 16.04上具有Tensorflow和OpenCL的多个AMD GPU

时间:2018-10-17 22:18:36

标签: tensorflow opencl amd-gpu sycl

经过很多努力:

  1. 使用OpenCL在新的 Ubuntu 16.04 amdgpu 17.50 上成功构建了Tensorflow。

  2. 已安装 5个相同的GPU(rx580),并且全部由clinfo和computecpp_info报告。

  3. 运行MNIST卷积示例, TF可以运行,但仅使用GPU0而看不到其他GPU

dmesg中没有关于卡的未报告错误,它们似乎都已在最低层准备就绪,不知道为什么 SYCL似乎忽略了某些卡 >。

这是 computecpp_info 输出:

********************************************************************************

ComputeCpp Info (CE 1.0.1)

SYCL 1.2.1 revision 3

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 5 devices matching:
  platform  : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 4:

  Device is supported                   : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                        : Ellesmere
  CL_DEVICE_VENDOR                      : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                     : 2527.3
  CL_DEVICE_TYPE                        : CL_DEVICE_TYPE_GPU

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.1/platform-support-notes

********************************************************************************

此处是 tensorflow 中的列表:

$ python3 list_gpus.py
2018-10-17 23:52:44.268968: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-17 23:52:44.385308: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-17 23:52:44.385342: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5429869323017416982
, name: "/device:SYCL:0"
device_type: "SYCL"
memory_limit: 268435456
locality {
}
incarnation: 7347791393919061653
physical_device_desc: "id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE"
]

编辑:重启后

我真的不知道这些警告是否相关,因为它们在第一次运行后就消失了。

$ python3 list_gpus.py
2018-10-18 00:47:13.943021: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-18 00:47:13.952909: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:45] No OpenCL accelerator nor GPU found that is supported by ComputeCpp/triSYCL trying OpenCL CPU
2018-10-18 00:47:13.952930: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:52] No OpenCL CPU found that is supported by ComputeCpp/triSYCL, checking for host sycl device
2018-10-18 00:47:13.952936: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:59] Found SYCL host device
2018-10-18 00:47:13.953004: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-18 00:47:13.953014: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: Host, name: Host Device, vendor: Codeplay Software Ltd., profile: FULL_PROFILE

编辑:dmesg详细信息

[    0.000000] Linux version 4.15.0-36-generic (buildd@lcy01-amd64-017) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 (Ubuntu 4.15.0-36.39~16.04.1-generic 4.15.18)
[    0.688885] pcie_mp2_amd: AMD(R) PCI-E MP2 Communication Driver Version: 1.0
[    1.143085] [drm] amdgpu kernel modesetting enabled.
[    1.173931] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[    1.564757] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    2.280211] amdgpu 0000:03:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    2.280212] amdgpu 0000:03:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    2.280322] [drm] amdgpu: 4096M of VRAM memory ready
[    2.280323] [drm] amdgpu: 4096M of GTT memory ready.
[    2.280427] amdgpu 0000:03:00.0: amdgpu: using MSI.
[    2.280439] [drm] amdgpu: irq initialized.
[    2.280452] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    2.280690] amdgpu 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    2.280758] amdgpu 0000:03:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    2.280784] amdgpu 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    2.280842] amdgpu 0000:03:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    2.280903] amdgpu 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    2.280965] amdgpu 0000:03:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    2.280985] amdgpu 0000:03:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    2.281001] amdgpu 0000:03:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    2.281015] amdgpu 0000:03:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    2.281028] amdgpu 0000:03:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    2.281332] amdgpu 0000:03:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    2.281348] amdgpu 0000:03:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    2.285039] amdgpu 0000:03:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    2.285056] amdgpu 0000:03:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    2.285069] amdgpu 0000:03:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    2.285578] amdgpu 0000:03:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    2.285594] amdgpu 0000:03:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    2.980155] amdgpu 0000:03:00.0: kfd not supported on this ASIC
[    2.980163] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:03:00.0 on minor 0
[    2.980215] amdgpu 0000:06:00.0: enabling device (0000 -> 0003)
[    4.068205] amdgpu 0000:06:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    4.068206] amdgpu 0000:06:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    4.068220] [drm] amdgpu: 4096M of VRAM memory ready
[    4.068221] [drm] amdgpu: 4096M of GTT memory ready.
[    4.068331] amdgpu 0000:06:00.0: amdgpu: using MSI.
[    4.068344] [drm] amdgpu: irq initialized.
[    4.068357] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    4.068444] amdgpu 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    4.068509] amdgpu 0000:06:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    4.068571] amdgpu 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    4.068639] amdgpu 0000:06:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    4.068665] amdgpu 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    4.068718] amdgpu 0000:06:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    4.068740] amdgpu 0000:06:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    4.068759] amdgpu 0000:06:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    4.068774] amdgpu 0000:06:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    4.068787] amdgpu 0000:06:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    4.069074] amdgpu 0000:06:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    4.069094] amdgpu 0000:06:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    4.072854] amdgpu 0000:06:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    4.072868] amdgpu 0000:06:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    4.072881] amdgpu 0000:06:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    4.073362] amdgpu 0000:06:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    4.073376] amdgpu 0000:06:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    4.771466] amdgpu 0000:06:00.0: kfd not supported on this ASIC
[    4.771476] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:06:00.0 on minor 2
[    4.771515] amdgpu 0000:07:00.0: enabling device (0000 -> 0003)
[    5.856168] amdgpu 0000:07:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    5.856169] amdgpu 0000:07:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    5.856178] [drm] amdgpu: 4096M of VRAM memory ready
[    5.856179] [drm] amdgpu: 4096M of GTT memory ready.
[    5.856284] amdgpu 0000:07:00.0: amdgpu: using MSI.
[    5.856297] [drm] amdgpu: irq initialized.
[    5.856311] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    5.856402] amdgpu 0000:07:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    5.856441] amdgpu 0000:07:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    5.856464] amdgpu 0000:07:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    5.856541] amdgpu 0000:07:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    5.856569] amdgpu 0000:07:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    5.856641] amdgpu 0000:07:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    5.856668] amdgpu 0000:07:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    5.856690] amdgpu 0000:07:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    5.856707] amdgpu 0000:07:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    5.856722] amdgpu 0000:07:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    5.857007] amdgpu 0000:07:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    5.857027] amdgpu 0000:07:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    5.860789] amdgpu 0000:07:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    5.860803] amdgpu 0000:07:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    5.860817] amdgpu 0000:07:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    5.861298] amdgpu 0000:07:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    5.861313] amdgpu 0000:07:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    6.563837] amdgpu 0000:07:00.0: kfd not supported on this ASIC
[    6.563845] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:07:00.0 on minor 3
[    6.563887] amdgpu 0000:08:00.0: enabling device (0000 -> 0003)
[    7.648177] amdgpu 0000:08:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    7.648178] amdgpu 0000:08:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    7.648188] [drm] amdgpu: 4096M of VRAM memory ready
[    7.648188] [drm] amdgpu: 4096M of GTT memory ready.
[    7.648292] amdgpu 0000:08:00.0: amdgpu: using MSI.
[    7.648306] [drm] amdgpu: irq initialized.
[    7.648322] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    7.648406] amdgpu 0000:08:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    7.648470] amdgpu 0000:08:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    7.648530] amdgpu 0000:08:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    7.648593] amdgpu 0000:08:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    7.648649] amdgpu 0000:08:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    7.648707] amdgpu 0000:08:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    7.648733] amdgpu 0000:08:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    7.648751] amdgpu 0000:08:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    7.648769] amdgpu 0000:08:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    7.648782] amdgpu 0000:08:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    7.649069] amdgpu 0000:08:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    7.649087] amdgpu 0000:08:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    7.652849] amdgpu 0000:08:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    7.652862] amdgpu 0000:08:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    7.652874] amdgpu 0000:08:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    7.653353] amdgpu 0000:08:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    7.653366] amdgpu 0000:08:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[    8.355909] amdgpu 0000:08:00.0: kfd not supported on this ASIC
[    8.355916] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:08:00.0 on minor 4
[    8.355957] amdgpu 0000:09:00.0: enabling device (0000 -> 0003)
[    9.440257] amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    9.440258] amdgpu 0000:09:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    9.440268] [drm] amdgpu: 4096M of VRAM memory ready
[    9.440268] [drm] amdgpu: 4096M of GTT memory ready.
[    9.440376] amdgpu 0000:09:00.0: amdgpu: using MSI.
[    9.440390] [drm] amdgpu: irq initialized.
[    9.440406] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    9.440499] amdgpu 0000:09:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[    9.440563] amdgpu 0000:09:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[    9.440625] amdgpu 0000:09:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[    9.440690] amdgpu 0000:09:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[    9.440753] amdgpu 0000:09:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[    9.440808] amdgpu 0000:09:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[    9.440831] amdgpu 0000:09:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[    9.440849] amdgpu 0000:09:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[    9.440865] amdgpu 0000:09:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[    9.440880] amdgpu 0000:09:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[    9.441167] amdgpu 0000:09:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[    9.441184] amdgpu 0000:09:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[    9.444946] amdgpu 0000:09:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[    9.444964] amdgpu 0000:09:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[    9.444976] amdgpu 0000:09:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[    9.445456] amdgpu 0000:09:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[    9.445469] amdgpu 0000:09:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[   10.147558] amdgpu 0000:09:00.0: kfd not supported on this ASIC
[   10.147564] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:09:00.0 on minor 5
[   10.147606] amdgpu 0000:0a:00.0: enabling device (0000 -> 0003)
[   11.232197] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[   11.232198] amdgpu 0000:0a:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[   11.232207] [drm] amdgpu: 4096M of VRAM memory ready
[   11.232207] [drm] amdgpu: 4096M of GTT memory ready.
[   11.232309] amdgpu 0000:0a:00.0: amdgpu: using MSI.
[   11.232322] [drm] amdgpu: irq initialized.
[   11.232337] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[   11.232427] amdgpu 0000:0a:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x        (ptrval)
[   11.232488] amdgpu 0000:0a:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x        (ptrval)
[   11.232551] amdgpu 0000:0a:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x        (ptrval)
[   11.232615] amdgpu 0000:0a:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x        (ptrval)
[   11.232675] amdgpu 0000:0a:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x        (ptrval)
[   11.232699] amdgpu 0000:0a:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x        (ptrval)
[   11.232717] amdgpu 0000:0a:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x        (ptrval)
[   11.232735] amdgpu 0000:0a:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x        (ptrval)
[   11.232749] amdgpu 0000:0a:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x        (ptrval)
[   11.232763] amdgpu 0000:0a:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x        (ptrval)
[   11.233048] amdgpu 0000:0a:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x        (ptrval)
[   11.233067] amdgpu 0000:0a:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x        (ptrval)
[   11.236830] amdgpu 0000:0a:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x        (ptrval)
[   11.236848] amdgpu 0000:0a:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x        (ptrval)
[   11.236860] amdgpu 0000:0a:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x        (ptrval)
[   11.237341] amdgpu 0000:0a:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x        (ptrval)
[   11.237355] amdgpu 0000:0a:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x        (ptrval)
[   11.939330] amdgpu 0000:0a:00.0: kfd not supported on this ASIC
[   11.939336] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:0a:00.0 on minor 6

编辑:它与任何特定的卡都不相关,只是公交单上的第一个可用卡。

我尝试断开某些卡的连接,在所有测试之后,很显然SYCL始终只列出第一个GPU,无论哪个,总是列出最小可用总线数。

这也证实了卡之间没有差异,并且所有卡都可以使用(至少单独使用),因此我认为该操作系统很好,我想问题出在SYCL。

请帮助!

1 个答案:

答案 0 :(得分:1)

截至今天为止,目前尚不支持具有Tensorflow和OpenCL的多个GPU ,即使文档中没有明确说明。

您可以在此处跟踪问题的详细信息,我在Github上发布了一个问题:https://github.com/codeplaysoftware/tensorflow/issues/16

如果有什么变化,我将更新此答案,但正如开发人员所说,这不是他们的优先事项!