Beignet OpenCL PyGPU问题

时间:2017-08-16 00:02:55

标签: python opencl theano beignet

我试图在Kubuntu 17.04上使用OpenCL作为Theano的后端,并遇到了几个我无法找到解决问题的问题。

由于我使用的是Intel Broadwell处理器(i7-5557u,如果有帮助的话),我下载了Beignet源代码(1.3.1)及其所有依赖项的副本,并make && make install编辑了它。根据{{​​3}},这似乎运作正常,因为

  1. ./utest_run命令报告100%成功
  2. clinfo提供了关于此处理器的OpenCL功能的一大堆信息(我认为这是正确的)。在安装Beignet之前,它没有显示任何支持。
  3. 接下来,我下载了一份Anaconda(4.4)并添加了Keras(2.0.6),Theano(0.9.0)和pygpu(0.6.9) conda包管理器。 Keras和Theano似乎工作正常,因为我从fast.ai课程改编的python脚本在使用CPU时显然应该做的事情(显然非常慢)。此外,一个简单的测试脚本取自你的网络,说CPU路径工作正常(instructions供参考)。

    为了使Theano使用OpenCL后端,我添加了~/.theanorc文件,其中包含以下内容:

    [global] floatX = float32 device = opencl0:0

    现在,当我运行上面的pastebin脚本时,它会出现以下错误:

    ERROR (theano.gpuarray): Could not initialize pygpu, support disabled Traceback (most recent call last): File "/home/sahab/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 164, in <module> use(config.device) File "/home/sahab/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 151, in use init_dev(device) File "/home/sahab/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 60, in init_dev sched=config.gpuarray.sched) File "pygpu/gpuarray.pyx", line 634, in pygpu.gpuarray.init File "pygpu/gpuarray.pyx", line 584, in pygpu.gpuarray.pygpu_init File "pygpu/gpuarray.pyx", line 1057, in pygpu.gpuarray.GpuContext.__cinit__ GpuArrayException: clGetPlatformIDs(0, NULL, &nump): Unknown error

    更简单的

    测试

    DEVICE="opencl0:0" python -c "import pygpu; pygpu.test()"

    抛出与上面相同的错误。

    认为问题源于Beignet而不是pygpu,但我不知道如何追踪根问题,因为clinfo似乎很好。我做了一些相当多的研究,但这似乎不是人们正在做的事情,因为没有任何文档/博客文章/关于它的内容。有什么想法吗?

    (为了记录,我相当肯定,鉴于我所拥有的计算机,这并没有给我太多的速度提升,但根据一些阅读,它应该至少使它成为1.5要快2倍,所以我仍然值得尝试追踪它。

    我的clinfo后代输出:

    Number of platforms                               2
      Platform Name                                   Intel Gen OCL Driver
      Platform Vendor                                 Intel
      Platform Version                                OpenCL 1.2 beignet 1.3
      Platform Profile                                FULL_PROFILE
      Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
      Platform Extensions function suffix             Intel
    
      Platform Name                                   Portable Computing Language
      Platform Vendor                                 The pocl project
      Platform Version                                OpenCL 2.0 pocl 0.13, LLVM 3.8.1
      Platform Profile                                FULL_PROFILE
      Platform Extensions                             cl_khr_icd
      Platform Extensions function suffix             POCL
    
      Platform Name                                   Intel Gen OCL Driver
    Number of devices                                 1
      Device Name                                     Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
      Device Vendor                                   Intel
      Device Vendor ID                                0x8086
      Device Version                                  OpenCL 1.2 beignet 1.3
      Driver Version                                  1.3
      Device OpenCL C Version                         OpenCL C 1.2 beignet 1.3
      Device Type                                     GPU
      Device Profile                                  FULL_PROFILE
      Max compute units                               48
      Max clock frequency                             1000MHz
      Device Partition                                (core)
        Max number of sub-devices                     1
        Supported partition types                     None, None, None
      Max work item dimensions                        3
      Max work item sizes                             512x512x512
      Max work group size                             512
      Preferred work group size multiple              16
      Preferred / native vector sizes                 
        char                                                16 / 8       
        short                                                8 / 8       
        int                                                  4 / 4       
        long                                                 2 / 2       
        half                                                 0 / 8        (cl_khr_fp16)
        float                                                4 / 4       
        double                                               0 / 2        (n/a)
      Half-precision Floating-point support           (cl_khr_fp16)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Single-precision Floating-point support         (core)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Double-precision Floating-point support         (n/a)
      Address bits                                    32, Little-Endian
      Global memory size                              4294967296 (4GiB)
      Error Correction support                        No
      Max memory allocation                           2147483648 (2GiB)
      Unified memory for Host and Device              Yes
      Minimum alignment for any data type             128 bytes
      Alignment of base address                       1024 bits (128 bytes)
      Global Memory cache type                        Read/Write
      Global Memory cache size                        8192
      Global Memory cache line                        64 bytes
      Image support                                   Yes
        Max number of samplers per kernel             16
        Max size for 1D images from buffer            65536 pixels
        Max 1D or 2D image array size                 2048 images
        Base address alignment for 2D image buffers   4096 bytes
        Pitch alignment for 2D image buffers          1 bytes
        Max 2D image size                             8192x8192 pixels
        Max 3D image size                             8192x8192x2048 pixels
        Max number of read image args                 128
        Max number of write image args                8
      Local memory type                               Local
      Local memory size                               65536 (64KiB)
      Max constant buffer size                        134217728 (128MiB)
      Max number of constant args                     8
      Max size of kernel argument                     1024
      Queue properties                                
        Out-of-order execution                        No
        Profiling                                     Yes
      Prefer user sync for interop                    Yes
      Profiling timer resolution                      80ns
      Execution capabilities                          
        Run OpenCL kernels                            Yes
        Run native kernels                            Yes
        SPIR versions                                 1.2
      printf() buffer size                            1048576 (1024KiB)
      Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
      Device Available                                Yes
      Compiler Available                              Yes
      Linker Available                                Yes
      Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp16
    
      Platform Name                                   Portable Computing Language
    Number of devices                                 1
      Device Name                                     pthread-Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
      Device Vendor                                   GenuineIntel
      Device Vendor ID                                0x8086
      Device Version                                  OpenCL 2.0 pocl
      Driver Version                                  0.13
      Device OpenCL C Version                         OpenCL C 2.0
      Device Type                                     CPU, Default
      Device Profile                                  FULL_PROFILE
      Max compute units                               4
      Max clock frequency                             3400MHz
      Device Partition                                (core)
        Max number of sub-devices                     4
        Supported partition types                     equally, by counts
      Max work item dimensions                        3
      Max work item sizes                             4096x4096x4096
      Max work group size                             4096
      Preferred work group size multiple              8
      Preferred / native vector sizes                 
        char                                                16 / 16      
        short                                                8 / 8       
        int                                                  4 / 4       
        long                                                 2 / 2       
        half                                                 8 / 8        (n/a)
        float                                                4 / 4       
        double                                               2 / 2        (cl_khr_fp64)
      Half-precision Floating-point support           (n/a)
      Single-precision Floating-point support         (core)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Double-precision Floating-point support         (cl_khr_fp64)
        Denormals                                     No
        Infinity and NANs                             Yes
        Round to nearest                              Yes
        Round to zero                                 No
        Round to infinity                             No
        IEEE754-2008 fused multiply-add               No
        Support is emulated in software               No
        Correctly-rounded divide and sqrt operations  No
      Address bits                                    64, Little-Endian
      Global memory size                              17862586368 (16.64GiB)
      Error Correction support                        No
      Max memory allocation                           17862586368 (16.64GiB)
      Unified memory for Host and Device              Yes
      Shared Virtual Memory (SVM) capabilities        (core)
        Coarse-grained buffer sharing                 Yes
        Fine-grained buffer sharing                   Yes
        Fine-grained system sharing                   No
        Atomics                                       Yes
      Minimum alignment for any data type             128 bytes
      Alignment of base address                       1024 bits (128 bytes)
      Preferred alignment for atomics                 
        SVM                                           0 bytes
        Global                                        0 bytes
        Local                                         0 bytes
      Max size for global variable                    0
      Preferred total size of global vars             0
      Global Memory cache type                        Read/Write
      Global Memory cache size                        32768
      Global Memory cache line                        64 bytes
      Image support                                   Yes
        Max number of samplers per kernel             16
        Max size for 1D images from buffer            1116411648 pixels
        Max 1D or 2D image array size                 2048 images
        Max 2D image size                             32768x32768 pixels
        Max 3D image size                             2048x2048x2048 pixels
        Max number of read image args                 128
        Max number of write image args                128
        Max number of read/write image args           <printDeviceInfo:106: get CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS : error -30>
      Max number of pipe args                         16
      Max active pipe reservations                    1
      Max pipe packet size                            1024
      Local memory type                               Global
      Local memory size                               17862586368 (16.64GiB)
      Max constant buffer size                        17862586368 (16.64GiB)
      Max number of constant args                     8
      Max size of kernel argument                     1024
      Queue properties (on host)                      
        Out-of-order execution                        No
        Profiling                                     Yes
      Queue properties (on device)                    
        Out-of-order execution                        Yes
        Profiling                                     Yes
        Preferred size                                16384 (16KiB)
        Max size                                      262144 (256KiB)
      Max queues on device                            1
      Max events on device                            1024
      Prefer user sync for interop                    Yes
      Profiling timer resolution                      1ns
      Execution capabilities                          
        Run OpenCL kernels                            Yes
        Run native kernels                            Yes
        SPIR versions                                 1.2
      printf() buffer size                            1048576 (1024KiB)
      Built-in kernels                                
      Device Available                                Yes
      Compiler Available                              Yes
      Linker Available                                Yes
      Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir cl_khr_int64 cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
    
    NULL platform behavior
      clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel Gen OCL Driver
      clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [Intel]
      clCreateContext(NULL, ...) [default]            Success [Intel]
      clCreateContext(NULL, ...) [other]              Success [POCL]
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
        Platform Name                                 Intel Gen OCL Driver
        Device Name                                   Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
      clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
        Platform Name                                 Intel Gen OCL Driver
        Device Name                                   Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3
    
    ICD loader properties
      ICD loader Name                                 OpenCL ICD Loader
      ICD loader Vendor                               OCL Icd free software
      ICD loader Version                              2.2.11
      ICD loader Profile                              OpenCL 2.1
    

0 个答案:

没有答案