我试图在Kubuntu 17.04上使用OpenCL作为Theano的后端,并遇到了几个我无法找到解决问题的问题。
由于我使用的是Intel Broadwell处理器(i7-5557u,如果有帮助的话),我下载了Beignet源代码(1.3.1)及其所有依赖项的副本,并make && make install
编辑了它。根据{{3}},这似乎运作正常,因为
./utest_run
命令报告100%成功clinfo
提供了关于此处理器的OpenCL功能的一大堆信息(我认为这是正确的)。在安装Beignet之前,它没有显示任何支持。接下来,我下载了一份Anaconda(4.4)并添加了Keras
(2.0.6),Theano
(0.9.0)和pygpu
(0.6.9) conda
包管理器。 Keras和Theano似乎工作正常,因为我从fast.ai课程改编的python脚本在使用CPU时显然应该做的事情(显然非常慢)。此外,一个简单的测试脚本取自你的网络,说CPU路径工作正常(instructions供参考)。
为了使Theano使用OpenCL后端,我添加了~/.theanorc
文件,其中包含以下内容:
[global]
floatX = float32
device = opencl0:0
现在,当我运行上面的pastebin脚本时,它会出现以下错误:
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File "/home/sahab/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 164, in <module>
use(config.device)
File "/home/sahab/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 151, in use
init_dev(device)
File "/home/sahab/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 60, in init_dev
sched=config.gpuarray.sched)
File "pygpu/gpuarray.pyx", line 634, in pygpu.gpuarray.init
File "pygpu/gpuarray.pyx", line 584, in pygpu.gpuarray.pygpu_init
File "pygpu/gpuarray.pyx", line 1057, in pygpu.gpuarray.GpuContext.__cinit__
GpuArrayException: clGetPlatformIDs(0, NULL, &nump): Unknown error
更简单的
测试
DEVICE="opencl0:0" python -c "import pygpu; pygpu.test()"
抛出与上面相同的错误。
我认为问题源于Beignet而不是pygpu,但我不知道如何追踪根问题,因为clinfo似乎很好。我做了一些相当多的研究,但这似乎不是人们正在做的事情,因为没有任何文档/博客文章/关于它的内容。有什么想法吗?
(为了记录,我相当肯定,鉴于我所拥有的计算机,这并没有给我太多的速度提升,但根据一些阅读,它应该至少使它成为1.5要快2倍,所以我仍然值得尝试追踪它。
我的clinfo
后代输出:
Number of platforms 2
Platform Name Intel Gen OCL Driver
Platform Vendor Intel
Platform Version OpenCL 1.2 beignet 1.3
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
Platform Extensions function suffix Intel
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 2.0 pocl 0.13, LLVM 3.8.1
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix POCL
Platform Name Intel Gen OCL Driver
Number of devices 1
Device Name Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
Device Vendor Intel
Device Vendor ID 0x8086
Device Version OpenCL 1.2 beignet 1.3
Driver Version 1.3
Device OpenCL C Version OpenCL C 1.2 beignet 1.3
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 48
Max clock frequency 1000MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None, None, None
Max work item dimensions 3
Max work item sizes 512x512x512
Max work group size 512
Preferred work group size multiple 16
Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (cl_khr_fp16)
float 4 / 4
double 0 / 2 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 4294967296 (4GiB)
Error Correction support No
Max memory allocation 2147483648 (2GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 8192
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4096 bytes
Pitch alignment for 2D image buffers 1 bytes
Max 2D image size 8192x8192 pixels
Max 3D image size 8192x8192x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 65536 (64KiB)
Max constant buffer size 134217728 (128MiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 80ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp16
Platform Name Portable Computing Language
Number of devices 1
Device Name pthread-Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Device Vendor GenuineIntel
Device Vendor ID 0x8086
Device Version OpenCL 2.0 pocl
Driver Version 0.13
Device OpenCL C Version OpenCL C 2.0
Device Type CPU, Default
Device Profile FULL_PROFILE
Max compute units 4
Max clock frequency 3400MHz
Device Partition (core)
Max number of sub-devices 4
Supported partition types equally, by counts
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Preferred work group size multiple 8
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 17862586368 (16.64GiB)
Error Correction support No
Max memory allocation 17862586368 (16.64GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 0
Preferred total size of global vars 0
Global Memory cache type Read/Write
Global Memory cache size 32768
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 1116411648 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 32768x32768 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args <printDeviceInfo:106: get CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS : error -30>
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Global
Local memory size 17862586368 (16.64GiB)
Max constant buffer size 17862586368 (16.64GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 16384 (16KiB)
Max size 262144 (256KiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir cl_khr_int64 cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel Gen OCL Driver
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [Intel]
clCreateContext(NULL, ...) [default] Success [Intel]
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel Gen OCL Driver
Device Name Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel Gen OCL Driver
Device Name Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1