Theano with opencl GPU

时间:2017-11-24 21:54:31

标签: opencl gpu theano

我已按如下方式配置theano

[idf@localhost python]$ more ~idf/.theanorc 
[global]
device = opencl0:0
floatX = float32

[lib]
cnmem=100
[idf@localhost python]$

我还需要

[idf@localhost python]$ export MKL_THREADING_LAYER=GNU

虽然有趣的是,如果我安装openblas并添加

[blas]
ldflags = -lopenblas

.theanorc file,我不再需要:

export MKL_THREADING_LAYER=GNU

使用我在互联网上找到的程序,我稍微修改后使用gpuarray,我尝试将theanoIntel GPU一起用于{{1} }:

opencl

当我运行程序时,似乎它识别出import os import shutil from theano import function, config, shared, gpuarray import theano.tensor as T import numpy import time vlen = 10 * 30 * 768 # 10 x #cores x # threads per core iters = 1000 rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function([], T.exp(x)) print(f.maker.fgraph.toposort()) t0 = time.time() for i in xrange(iters): r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r)) if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): print('Used the cpu') else: print('Used the gpu') ,但最后会打印出“使用cpu”的消息。

GPU

我对“使用cpu”这个消息持怀疑态度:对于[idf@localhost python]$ python theanoexam1.py Mapped name None to device opencl0:0: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2 [GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)] Looping 1000 times took 1.231896 seconds Result is [ 1.23178029 1.61879337 1.52278054 ..., 2.20771813 2.29967737 1.62323284] Used the cpu [idf@localhost python]$ 有四个Intel i3,1.231896秒似乎很快。

coresopencl一起使用是否需要额外的配置?或者,此计划确实显示theano已配置为使用theanoGPU

1 个答案:

答案 0 :(得分:0)

首先感谢你发帖。

我正在使用Conda在Ubuntu 16.04上运行,我手动安装了libgpuarray - 所有这些都在网上有详细记录。

我使用了你做的相同的测试程序(谢谢你提供它)。

所以这是我的设置

export MKL_THREADING_LAYER=GNU

文件〜/ .theanorc看起来像这样

[global]
device = opencl0:0
floatX = float32

[lib]
cnmem=100

当我运行代码时

python test.py

我看到输出

DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
Mapped name None to device opencl0:0: Ellesmere 
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.282664 seconds
Result is [1.2317803 1.6187935 1.5227805 ... 2.207718  2.2996776 1.6232328]
Used the gpu

我无法弄清楚如何使用第二个GPU(也是OpenCL) - 但我很高兴至少我有1个GPU运行。