我已按如下方式配置theano
:
[idf@localhost python]$ more ~idf/.theanorc
[global]
device = opencl0:0
floatX = float32
[lib]
cnmem=100
[idf@localhost python]$
我还需要
[idf@localhost python]$ export MKL_THREADING_LAYER=GNU
虽然有趣的是,如果我安装openblas
并添加
[blas]
ldflags = -lopenblas
到.theanorc file
,我不再需要:
export MKL_THREADING_LAYER=GNU
使用我在互联网上找到的程序,我稍微修改后使用gpuarray
,我尝试将theano
与Intel
GPU
一起用于{{1} }:
opencl
当我运行程序时,似乎它识别出import os
import shutil
from theano import function, config, shared, gpuarray
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
,但最后会打印出“使用cpu”的消息。
GPU
我对“使用cpu”这个消息持怀疑态度:对于[idf@localhost python]$ python theanoexam1.py
Mapped name None to device opencl0:0: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.231896 seconds
Result is [ 1.23178029 1.61879337 1.52278054 ..., 2.20771813 2.29967737
1.62323284]
Used the cpu
[idf@localhost python]$
有四个Intel i3
,1.231896秒似乎很快。
将cores
与opencl
一起使用是否需要额外的配置?或者,此计划确实显示theano
已配置为使用theano
到GPU
?
答案 0 :(得分:0)
首先感谢你发帖。
我正在使用Conda在Ubuntu 16.04上运行,我手动安装了libgpuarray - 所有这些都在网上有详细记录。
我使用了你做的相同的测试程序(谢谢你提供它)。
所以这是我的设置
export MKL_THREADING_LAYER=GNU
文件〜/ .theanorc看起来像这样
[global]
device = opencl0:0
floatX = float32
[lib]
cnmem=100
当我运行代码时
python test.py
我看到输出
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
Mapped name None to device opencl0:0: Ellesmere
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.282664 seconds
Result is [1.2317803 1.6187935 1.5227805 ... 2.207718 2.2996776 1.6232328]
Used the gpu
我无法弄清楚如何使用第二个GPU(也是OpenCL) - 但我很高兴至少我有1个GPU运行。